DHOT-Repository and Classification of Offensive Tweets in the Hindi Language

Publication Type : Journal Article

Publisher : Elsevier BV

Source : Procedia Computer Science

Url : https://doi.org/10.1016/j.procs.2020.04.252

Keywords : NLP, Devanagari Hindi, Abusive Text Classification, Profanity Detection, Offensive Text Detection, Devanagari, fast Text

Campus : Amritapuri

School : School of Computing

Year : 2020

Abstract : While social media gives people an online platform for expressing their views, knowledge, experiences and emotions, a major problem occurs when social media interactions becomes a platform for abusive remarks, comments and conversations. Apart from slurs being offensive in conversations, slurs vary in usage to express contempt, difference of opinions, and in some cases humor. Abusive language can potentially be used to offend someone, to promote racism, sexism, etc. Hindi is the third most popular language in the world, based on the number of speakers globally. It is spoken by millions of Indians from different regional influences and linguistic preferences it has become very rich in it’s diversity and usage. While "Hinglish" (Hindi written in Roman script instead of the native Devanagari) is extensively used online, native Hindi speakers who write in Devanagari are on a steady rise. Despite this, little research has been done on the use of Hindi as an online language. This paper presents a model to distinguish and then classify offensive text from non-offensive using a fast Text-based model The model was able to classify text from a Devanagari Hindi Offensive Tweets (DHOT) data corpus. A grid-search method was applied to tune hyperparameters during fast Text model runs, and provided interesting insights on the model accuracy and precision. Our fast Text model achieved 92.2% accuracy employing desktop class machine for the processing. To our knowledge, this is the first attempt to establish a state of the art classification of offensive text in Hindi using fast Text models.

Cite this Research Publication : Vikas Kumar Jha, Hrudya P, Vinu P N, Vishnu Vijayan, Prabaharan P, DHOT-Repository and Classification of Offensive Tweets in the Hindi Language, Procedia Computer Science, Elsevier BV, 2020, https://doi.org/10.1016/j.procs.2020.04.252

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Publication