Publication Type : Conference Paper
Publisher : CEUR Workshop Proceedings, CEUR-WS.
Source : CEUR Workshop Proceedings, CEUR-WS, Volume 1737, p.309-312 (2016)
Keywords : Artificial intelligence, Character recognition, Conditional random field, Data mining, Entity extractions, Entity recognition, Hybrid features, Indian languages, Information Retrieval, Modeling languages, random processes, Sequential model, Social media, Training corpus
Campus : Coimbatore
School : School of Engineering
Center : Computational Engineering and Networking
Department : Electronics and Communication
Year : 2016
Abstract : Entity Recognition is an essential part of Information Extraction, where explicitly available information and relations are extracted from the entities within the text. Plethora of information is available in social media in the form of text and due to its nature of free style representation, it introduces much complexity while mining information out of it. This complexity is enhanced more by representing the text in more than one language and the usage of transliterated words. In this work we utilized sequential modeling algorithm with hybrid features to perform the Entity Recognition on the corpus given by CMEE-IL (Code Mixed Entity Extraction - Indian Language) organizers. The experimented approach performed great on both the Tamil-English and Hindi-English tweet corpus by attaining nearly 95% against the training corpus and 45.17%, 31.44% against the testing corpus.
Cite this Research Publication : H. B. Barathi Ganesh, Dr. M. Anand Kumar, and Dr. Soman K. P., “Conditional random fields for code mixed Entity Recognition”, in CEUR Workshop Proceedings, 2016, vol. 1737, pp. 309-312.