Analysis of Contextual and Non-contextual Word Embedding Models for Hindi NER with Web Application for Data Collection

Publication Type : Conference Paper

Source : Advanced Computing, p.183-202 (2021)

Url : https://link.springer.com/chapter/10.1007/978-981-16-0401-0_14

ISBN : 9789811604003

Campus : Amritapuri, Coimbatore

School : School of Artificial Intelligence, School of Artificial Intelligence - Coimbatore, School of Computing

Center : Computational Engineering and Networking

Department : Electronics and Communication

Year : 2021

Abstract : Named Entity Recognition (NER) is the process of taking a string and identifying relevant proper nouns in it. In this paper (All codes and datasets used in this paper are available at: https://github.com/AindriyaBarua/Contextual-vs-Non-Contextual-Word-Embed...(link is external).) we report the development of the Hindi NER system, in Devanagari script, using various embedding models. We categorize embeddings as Contextual and Non-contextual, and further compare them inter and intra-category. Under non-contextual type embeddings, we experiment with Word2Vec and FastText, and under the contextual embedding category, we experiment with BERT and its variants, viz. RoBERTa, ELECTRA, CamemBERT, Distil-BERT, XLM-RoBERTa. For non-contextual embeddings, we use five machine learning algorithms namely Gaussian NB, Adaboost Classifier, Multi-layer Perceptron classifier, Random Forest Classifier, and Decision Tree Classifier for developing ten Hindi NER systems, each, once with Fast Text and once with Gensim Word2Vec word embedding models. These models are then compared with Transformers based contextual NER models, using BERT and its variants. A comparative study among all these NER models is made. Finally, the best of all these models is used and a web app is built, that takes a Hindi text of any length and returns NER tags for each word and takes feedback from the user about the correctness of tags. These feed-backs aid our further data collection.

Cite this Research Publication : Barua, A., Thara, S., Premjith, B., Soman, K. P., "Analysis of Contextual and Non-contextual Word Embedding Models for Hindi NER with Web Application for Data Collection," (2021) Communications in Computer and Information Science, 1367, pp. 183-202. DOI: 10.1007/978-981-16-0401-0_14

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Publication