Spam Identification using Machine Learning and Homomorphic Encryption
Project Abstract
This project explores the use of machine learning and homomorphic encryption in email spam filtering. Historically, rule-based approaches were used to detect and filter out spam mail. These rules contained texts and metadata that were expected in spam mail. In recent times, the rapid advances in Machine Learning techniques have opened the door for effective and robust spam detection. These are particularly effective as they can learn the hidden patterns in the data and filter new mail better than the rule-based approach. However, the intrinsic issue of using machine learning models for spam detection is privacy. Homomorphic encryption allows direct computation of encrypted data without decrypting it first, hence it can be used where machine learning is involved. The main aim of this project will be to encrypt the available data using a type of Partial Homomorphic encryption scheme called Paillier encryption and use machine learning models like Logistic Regression or SVM to train our encrypted data.The results of this paper will be useful for a wide range of entities like individual users, industries, government bodies, etc. Effective spam detection using encrypted data will allow better usage of IT resources and the labour force while also ensuring good privacy and security practices.
Keywords: Machine Learning, Homomorphic Encryption, Spam Identification
Conference Details
Session: Presentation Stream 10 at Presentation Slot 4
Location: CoFo 002 at Tuesday 7th 13:30 – 17:00
Markers: Randell Gaya, George Brooks (GTA)
Course: MSc Data Science, Masters PG
Future Plans: I’m looking for work