Publication Type:

Conference Paper

Source:

CEUR Workshop Proceedings, CEUR-WS, Volume 1737, p.321-324 (2016)

URL:

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85006138001&partnerID=40&md5=d602605a35f98f0fa65cde01714018c1

Keywords:

Artificial intelligence, Codes (symbols), Context-based, Cross validation, Data mining, Entity extractions, Fires, Indian languages, Information Retrieval, Mixed supports, Named entity recognition, Social networking (online), Support vector machines, Training data, Word embedding

Abstract:

This paper presents the working methodology and results on Code Mix Entity Extraction in Indian Languages (CMEE-IL) shared the task of FIRE-2016. The aim of the task is to identify various entities such as a person, organization, movie and location names in a given code-mixed tweets. The tweets in code mix are written in English mixed with Hindi or Tamil. In this work, Entity Extraction system is implemented for both Hindi-English and Tamil-English code-mix tweets. The system employs context based character embedding features to train Support Vector Machine (SVM) classifier. The training data was tokenized such that each line containing a single word. These words were further split into characters. Embedding vectors of these characters are appended with the I-O-B tags and used for training the system. During the testing phase, we use context embedding features to predict the entity tags for characters in test data. We observed that the cross-validation accuracy using character embedding gave better results for Hindi-English twitter dataset compare to Tamil-English twitter dataset.

Notes:

cited By 0; Conference of 2016 Forum for Information Retrieval Evaluation, FIRE 2016 ; Conference Date: 7 December 2016 Through 10 December 2016; Conference Code:125007

Cite this Research Publication

S. V. Skanda, Singh, S., G. Devi, R., Veena, P. V., Dr. M. Anand Kumar, and Dr. Soman K. P., “CEN@Amrita FIRE 2016: Context based character embeddings for entity extraction in code-mixed text”, in CEUR Workshop Proceedings, 2016, vol. 1737, pp. 321-324.

207
PROGRAMS
OFFERED
5
AMRITA
CAMPUSES
15
CONSTITUENT
SCHOOLS
A
GRADE BY
NAAC, MHRD
9th
RANK(INDIA):
NIRF 2017
150+
INTERNATIONAL
PARTNERS