Back close

Social media text analytics of Malayalam – English code-mixed using deep learning

Publication Type : Journal Article

Publisher : Springer

Source : J Big Data 9, 45 (2022)

Url :

Campus : Amritapuri

School : School of Computing

Center : Computational Linguistics and Indic Studies

Year : 2022

Abstract : Zigzag conversational patterns of contents in social media are often perceived as noisy or informal text. Unrestricted usage of vocabulary in social media communications complicates the processing of code-mixed text. This paper accentuates two major aspects of code mixed text: Offensive Language Identification and Sentiment Analysis for Malayalam–English code-mixed data set. The proffered framework addresses 3 key points apropos these tasks—dependencies among features created by embedding methods (Word2Vec and FastText), comparative analysis of deep learning algorithms (uni-/bi-directional models, hybrid models, and transformer approaches), relevance of selective translation and transliteration and hyper-parameter optimization—which ensued in F1-Scores (model’s accuracy) of 0.76 for Forum for Information Retrieval Evaluation (FIRE) 2020 and 0.99 for European Chapter of the Association for Computational Linguistics (EACL) 2021 data sets. A detailed error analysis was also done to give meaningful insights. The submitted strategy turned in the best results among the benchmarked models dealing with Malayalam–English code-mixed messages and it serves as an important step towards societal good.

Cite this Research Publication : Thara,S.,Poornachandran,P. Social media text analytics of Malayalam – English code-mixed using deep learning. J Big Data 9, 45 (2022).

Admissions Apply Now