Programs
- M. Tech. in Automotive Engineering -Postgraduate
- Fellowship in Uro Oncology & Robotic Urology 1 Year -Fellowship
Publication Type : Journal Article
Publisher : Elsevier BV
Source : Procedia Computer Science
Url : https://doi.org/10.1016/j.procs.2020.04.252
Keywords : NLP, Devanagari Hindi, Abusive Text Classification, Profanity Detection, Offensive Text Detection, Devanagari, fast Text
Campus : Amritapuri
School : School of Computing
Year : 2020
Abstract : While social media gives people an online platform for expressing their views, knowledge, experiences and emotions, a major problem occurs when social media interactions becomes a platform for abusive remarks, comments and conversations. Apart from slurs being offensive in conversations, slurs vary in usage to express contempt, difference of opinions, and in some cases humor. Abusive language can potentially be used to offend someone, to promote racism, sexism, etc. Hindi is the third most popular language in the world, based on the number of speakers globally. It is spoken by millions of Indians from different regional influences and linguistic preferences it has become very rich in it’s diversity and usage. While "Hinglish" (Hindi written in Roman script instead of the native Devanagari) is extensively used online, native Hindi speakers who write in Devanagari are on a steady rise. Despite this, little research has been done on the use of Hindi as an online language. This paper presents a model to distinguish and then classify offensive text from non-offensive using a fast Text-based model The model was able to classify text from a Devanagari Hindi Offensive Tweets (DHOT) data corpus. A grid-search method was applied to tune hyperparameters during fast Text model runs, and provided interesting insights on the model accuracy and precision. Our fast Text model achieved 92.2% accuracy employing desktop class machine for the processing. To our knowledge, this is the first attempt to establish a state of the art classification of offensive text in Hindi using fast Text models.
Cite this Research Publication : Vikas Kumar Jha, Hrudya P, Vinu P N, Vishnu Vijayan, Prabaharan P, DHOT-Repository and Classification of Offensive Tweets in the Hindi Language, Procedia Computer Science, Elsevier BV, 2020, https://doi.org/10.1016/j.procs.2020.04.252