Back close

CEN@Amrita FIRE 2016: Context based character embeddings for entity extraction in code-mixed text

Publication Type : Conference Paper

Publisher : CEUR Workshop Proceedings, CEUR-WS.

Source : CEUR Workshop Proceedings, CEUR-WS, Volume 1737, p.321-324 (2016)

Url : https://www.scopus.com/inward/record.uri?eid=2-s2.0-85006138001&partnerID=40&md5=d602605a35f98f0fa65cde01714018c1

Keywords : Artificial intelligence, Codes (symbols), Context-based, Cross validation, Data mining, Entity extractions, Fires, Indian languages, Information Retrieval, Mixed supports, Named entity recognition, Social networking (online), Support vector machines, Training data, Word embedding

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Electronics and Communication

Year : 2016

Abstract : This paper presents the working methodology and results on Code Mix Entity Extraction in Indian Languages (CMEE-IL) shared the task of FIRE-2016. The aim of the task is to identify various entities such as a person, organization, movie and location names in a given code-mixed tweets. The tweets in code mix are written in English mixed with Hindi or Tamil. In this work, Entity Extraction system is implemented for both Hindi-English and Tamil-English code-mix tweets. The system employs context based character embedding features to train Support Vector Machine (SVM) classifier. The training data was tokenized such that each line containing a single word. These words were further split into characters. Embedding vectors of these characters are appended with the I-O-B tags and used for training the system. During the testing phase, we use context embedding features to predict the entity tags for characters in test data. We observed that the cross-validation accuracy using character embedding gave better results for Hindi-English twitter dataset compare to Tamil-English twitter dataset.

Cite this Research Publication : S. V. Skanda, Singh, S., G. Devi, R., Veena, P. V., Dr. M. Anand Kumar, and Dr. Soman K. P., “CEN@Amrita FIRE 2016: Context based character embeddings for entity extraction in code-mixed text”, in CEUR Workshop Proceedings, 2016, vol. 1737, pp. 321-324.

Admissions Apply Now