Back close

Comparative Analysis of Word Embeddings for Text Classification in Spark NLP

Publication Type : Conference Paper

Publisher : IEEE

Source : 2023 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)

Url : https://doi.org/10.1109/ccem60455.2023.00027

Campus : Bengaluru

School : School of Computing

Year : 2023

Abstract : Text categorization serves as a fundamental component in the domain of natural language processing, finding practical uses in tasks such as retrieving information, analyzing sentiments, and sorting documents. This investigation is centered on the assessment and comparison of the efficacy of diverse word embedding techniques in the context of text classification using Spark NLP. The examined word embeddings encompass TF-IDF, Glove, Bert, ELMO, and Universal Sentence Encoder. Additionally, this inquiry delves into the performance of logistic regression and random forest models utilizing different word embedding methods, leveraging the ClassifierDL approach. The research entails comprehensive experimentation and analysis to gauge the suitability of each word embedding technique for a range of classification tasks. Assessment measures, such as accuracy, precision, recall, and the F1 score, are utilized to gauge the effectiveness of these embeddings. Moreover, the research considers computational efficiency and scalability within the Spark NLP framework. The main aim of this research is to provide valuable perspectives on the advantages and constraints linked to each method of word embedding. These insights serve to empower researchers and practitioners in making well-informed choices when selecting the most suitable strategy for text classification in the context of Spark NLP. The discoveries of this study contribute to the advancement of NLP techniques and facilitate the creation of more precise and efficient text classification models using spark.

Cite this Research Publication : Aarathi Rajagopalan Nair, Supriya M, Deepa Gupta, Comparative Analysis of Word Embeddings for Text Classification in Spark NLP, 2023 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), IEEE, 2023, https://doi.org/10.1109/ccem60455.2023.00027

Admissions Apply Now