Thomas McAuley (2110735)
Using Blood Data and Machine Learning to Predict the Likelihood of Stroke

Project Abstract
The future for the development of medication and diagnoses is going to be heavily influenced by the extent of which we can take advantage of the benefits that machine learning offers. There is great potential and hope that machine learning will be able to amplify the rate of discovery and reduce the costs of developing scientific breakthroughs which will have a direct impact for the betterment of health in our global society. The base study we have partly replicated is “a machine learning model predicts stroke associated with blood cadmium level.” by Wenwei Zuo et al. We propose a range of machine learning models to train on NHANES data, varying across the Demographics, Questionnaire, Laboratory and Examination categories whilst using methods employed by a base study to replicate existing results when it comes to the prediction of stroke using blood cadmium levels as the primary indicator and to develop these findings further. As part of our study we have chosen to implement a common model between us and the base study – Logistic Regression (LR). We have then gone on to develop two unique models using support vector machine (SVM) and CatBoost (a variation on Gradient Boosting). Each of these models has 4 different adjustment levels to account for potential class imbalance issues present in the base study. Alongside the development of this study, we have also developed a tool known as the NVisualiser, designed to streamline the way researchers will be able to aggregate data for their machine learning research projects and allow them to visualise data from across multiple NHANES datasets. We have succeeded in determining more realistic predictions for the likelihood of a patient suffering from stroke from the data acquired from NHANES and after reproducing parts of the base study.
Keywords: Machine Learning, Web Application for Researchers, Science, Health and Medicine
Conference Details
Session: A
Location: Sir Stanley Clarke Auditorium at 11:00 13:00
Markers: Benjamin Mora, Arno Pauly
Course: BSc Software Engineering with a Year in Industry 4yr FI
Future Plans: I’m looking for work