Back close

Email Spam Detection with Machine Learning and Vectorization

Publication Type : Conference Paper

Publisher : IEEE

Source : 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)

Url : https://doi.org/10.1109/icccnt61001.2024.10724545

Campus : Chennai

School : School of Engineering

Year : 2024

Abstract : Addressing the current pressing issue of managing unwanted and potentially hazardous email communications, this study focuses on the Ling spam dataset to effectively categorize spam emails. To address the dataset’s imbalance, the study employs ADASYN, an oversampling technique, to rectify the biased distribution. Following this, two vectorization models, FastText and GloVe, were employed, and three machine learning algorithms, Random Forest, Navies Bayes and Logistic Regression, were utilized for spam/ham categorization. Among the vectorization models, GloVe stands out as the optimal choice. Notably, when combined with Naive Bayes, GloVe demonstrates the best performance, achieving an impressive accuracy rate of 98.7% on the Ling spam dataset. This surpasses existing research works, including a prior study by Samira Douzi [1], which employed an integrated approach to spam filtering combining various techniques, achieving a 98.27% accuracy rate.

Cite this Research Publication : Saravanan R, Vaisshale Rathinasamy, Email Spam Detection with Machine Learning and Vectorization, 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), IEEE, 2024, https://doi.org/10.1109/icccnt61001.2024.10724545

Admissions Apply Now