Caroline Segestaal (2119849) Caroline Segestaal

Machine Learning-Based Evaluation of Biomarkers for Predicting High-Risk Prostate Cancer

Project Abstract

This project assess?�� novel biomarkers markers in their use for identifying patients with aggressive prostate cancer as well as delving into the effectiveness of machine learning models in predicting prostate cancer prognosis. Prostate cancer is one of the most common forms of cancer in the world today, diagnosing it is a high-risk, high-cost, and painful procedure, yet not a completely accurate one either. In the United Kingdom around 100 000 prostate biopsies and 50 000 new prostate cancer diagnosis occur each year. Prostate biopsies are invasive, with risks varying from incontinence and impotence to sepsis, a potentially life-threatening systemic reaction to an infection.The project evaluates a variety of machine-learning techniques and algorithms to evaluate them for their accuracy and efficiency, in a hope to minimize the need for prostate biopsies for diagnosis in the future.Using Na�?ve Bayes Classifier, Support Vector Machine and Extreme Gradient Boosting machine learning algorithms, models have been developed to assess five new potential cancer indicators. Models are evaluated using accuracy, precision, F1 score, confusion matrix as well as AUROC, to evaluate their usability as well as the usability of the biomarkers as diagnostic measurements.Various results have been produced and evaluated, with accuracies ranging from 20% to 80% depending on machine learning model and biomarker combinations used. Full results will be shared at the fair.Extreme Gradient Boosting algorithm has given the most accurate as well as stable results. While the Heparan Sulfate Proteoglycan BGLY was the most promising biomarker. A BGLY and PSA combination run on an Extreme Gradient Boosting model using ?��hist?�� as the tree model and ?��gbtree?�� as the booster gives an average accuracy of 60% and an average AUROC score of 76%. This shows promising results for further development in using machine learning to minimise need of prostate biopsies.

Keywords: Machine Learning, Bioinformatics, Data Analysis

 

 Conference Details

 

Session: Poster Session B at Poster Stand 96

Location: Sir Stanley Clarke Auditorium at Wednesday 8th 09:00 – 12:30

Markers: Mukesh Tiwary, Ulrich Berger

Course: BSc Computer Science, 3rd Year

Future Plans: I’m looking for work