Resume
📋

Resume

👋
I’m currently a master student of Biomedical Informatics in National University of Singapore. Prior to my master’s degree, I received my bachelors degree in Data Science from Beijing Normal University-Hong Kong Baptist University United International College.

Contact

📧 qirui_he@u.nus.edu
📱🇸🇬+65 88865324

Experience

Database Architect Intern

China Telecom Corporation Limited, Foshan, China – (July 2021 - September 2021)
  • Conducted orientation on Oracle database system and learned the basic operation on Foshan Integrated Information Unified Management platform.
  • Made visualizations using Foshan Integrated Information Unified Management based on the business forms ordered by data collection department.
  • Took part in decreasing the time complexity of the searching system by applying Impala SQL on Foshan Integrated Information Unified Management platform.

Data Analyst Intern

Dofine Information Technology Company Limited, Guangzhou, China – (July 2019 - August 2019)
  • Conducted data analytics of bidding prices to assist the marketing department to make advertising plans with the most popular searching engines such as 360, Sogou and Baidu.
  • Data supervision of conversion rate from searching engines to the main page of the company, and evaluation of possible new clients’ characteristics.
  • Analyzed customers' need for corporations for Enterprise Information Management System, and communicated with related departments to carry out information management plans for customers.

Skills

Python

notion image
My "native" programming language, I've worked with it for over 6 years. During my undergraduate study I had learnt to use Python for Data Analytics, Data Mining, Machine Learning, Deep Learning and so on. The packages I’m familiar with includes Pandas, NumPy, Matplotlib, Math, TensorFlow, Keras, PyTorch and Seaborn.

MySQL

notion image
The database system I’m most familiar with. I had experienced of using MySQL, Python and Django to create a searching system for libraries. Besides, I also self studied Impala SQL during my internship in China Telecom.

Java

notion image
During the Object Oriented Programming course, I gained fundamental knowledge of Java and had experience of implementing a simple bubble game with Java.

C++

notion image
In the course Data Structures&Algorithms, I learnt to use C++ to implement data structures such as stack, heap, list and trees. Besides, I also implemented popular algorithms such as divide and conquer, greedy and dynamic programming with C++.

Linux

notion image
Familiar with the operation in Linux System.

Tableau

notion image
Familiar with creating data visualizations with Tableau.

R

notion image
Familiar with doing regression analysis with R.

Languages

Chinese🇨🇳

Native speaker

Cantonese🇭🇰

Native speaker

English 🇺🇸

Proficient speaker
 

Academic Projects

  • Drivable Area Detection
    • Literature Review: Search for the literature and pages that investigate and arrange the development of object detection in the past 20 years. For each milestone during the development of object detection, learn the theory and implement of the methods for object detection and then write a literature review. Also for those methods and algorithms that can be used for autonomous driving, especially focus on them and try to re-implement these methods by myself.
    • Details: Implement of Drivable area detection: Using both of traditional object detection method(handcrafted features + machine learning techniques) and deep learning based method(YOLOP) to implement the lane detection and car detection respectively. Then combine the lane detection part and car detection part into a complete system for autonomous driving.
  • Recommender Systems and EDA on H&M Fashion Dataset
    • Details: Applied EDA on the H&M fashion dataset to gain insights into the data and implemented Popularity Recommender System, Cosine Recommender System and Pearson Recommender System based on Turi Create. The Popularity Recommender System recommends the top 12 popular items among all items to users while Cosine and Pearson Recommender System recommends the top 12 items that are most similar to each user’s previous purchases based on the collaborative filtering.
  • ICU Intubation Prediction
    • Details: Based on the exploratory data analysis on the pre-extracted MIMIC-IV dataset, applied SOMTE as the oversampling method to deal with the imbalanced data. Then implemented Logistic Regression, Decision Tree and Random Forest classifier for the prediction. For each model, applied Genetic Algorithm for feature selection separately and used grid search to finetune the parameters of each model. The Random Forest classifier achieve the highest Area Under ROC of 0.76 with an accuracy of 0.96 on predicting the patient’s chance of intubation in ICU.
  • Chest X-image Classification Based on Deep Learning Methods
    • Details: Use three different deep convolutional neural network models to deal with the classification of Chest X-image classification for biomedical treatment. Applied data augmentation, data enhancement and transfer learning methods to improve the accuracy of the classification system. The final accuracy of the best model based on InceptionResNetV2 achieve an accuracy of 100% in binary classification problem and an accuracy of 97.45% in multiple classification problem.
  • Garbage Classification Based on Deep Learning Methods
    • Details: Implement a garbage classification system by using the techniques of deep learning. The system uses YOLOv5 as the main algorithm and uses the garbage images provided by the Zhuhai Open Data Innovation Apps Contest as training data sets and testing data sets. The training and testing process is completed on the Zhuhai Supercomputing Center. The final prediction accuracy of the garbage classification system is over 80%.
  • Atmospheric Pollution Analysis Based on Space-time Tendency
    • Details: Implement an animation for atmospheric pollution analysis based on space-time tendency by using Tableau. The animation consists of four visualizations, including two line charts, one scatterplot and a choropleth.
  • Study on the Stock Prediction System Based on LSTM (Long Short-term memory model) and RNN (Recurrent Neural Network)
    • Details: Collect stock price data with multiple features from Yahoo finance and applied data preprocessing methods including normalization and data cleaning. Design the LSTM and RNN models and trained the models with the latest collected stock price data. After optimizations and adjusting the parameters of the models, the prediction system is able to achieve a 85% accuracy.
  • Atmospheric Pollution Analysis Based on Space-time Tendency
    • Details: Data preprocessing of the data using Python, and used Tableau to create interactive dashboards with multiple visualizations and calculated fields for user to explore the data and find insights about the atmospheric pollution.
  • Classification of Emails by Using Naïve Bayes Classifier
    • Details: Collect a set of emails and turn all the contents of these emails into token lists. Then manually give the token vectors to each of the token lists as label to evaluate the property of each token and further use Naïve Bayes Method to classify the emails to eliminate spam emails.
  • Efficient Cryptanalysis on Playfair Cipher
    • Details: Implement the encryption and decryption of a message by using Playfair cipher. Also implement the Simulate Annealing algorithm and Multi-core Parallel Computing to increase the speed of decryption, for a page of ciphertext, it only takes less than 0.02 second to complete the decryption by using our program.
 

Education

Master's Degree in Biomedical Informatics

08/2022-Now
National University of Singapore

Bachelor's Degree in Data Science

09/2018-06/2022
Beijing Normal University-Hong Kong Baptist University United International College(UIC)