Publication Type:

Journal Article

Source:

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, Volume 10478 LNCS, p.206-218 (2018)

ISBN:

9783319736051

URL:

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85041849478&doi=10.1007%2f978-3-319-73606-8_16&partnerID=40&md5=015290ce32cfa7ede3f79affbf15881e

Keywords:

Clustering algorithms, Code-mixed text, Codes (symbols), Data mining, Entity extractions, extraction, Learning systems, Social media, Social networking (online), Support vector machines, Text processing, Tri grams, Word embedding

Abstract:

<p>Social media play an important role in, today’s society. Social media is the platform for people to express their opinion about various aspects using natural language. The social media text generally contains code-mixed content. The use of code-mixed data is popular in them because the users tend to mix multiple languages in their conversation instead of using their native script as unicode characters. Entity extraction, the task of extracting useful entities like Person, Location and Organization, is an important primary task in social media text analytics. Extracting entities from code-mixed social media text is a difficult task. Three different methodologies are proposed in this paper for extracting entities from Hindi-English and Tamil-English code-mixed data. This work is submitted to the shared task on Code-Mix Entity Extraction for Indian Languages (CMEE-IL) at the Forum for Information Retrieval Evaluation (FIRE) 2016. The proposed systems include approaches based on the embedding models and feature-based model. BIO-tag formatting is done as a pre-processing step. Extraction of trigram embedding is performed during feature extraction. The development of the system is carried out using Support Vector Machine-based machine learning classifier. For the CMEE-IL task, we secured second position for Tamil-English data and third for Hindi-English. Additionally, evaluation of primary entities and their accuracies were analyzed in detail for further improvement of the system. © Springer International Publishing AG. 2018.</p>

Notes:

cited By 0; Conference of International Workshop on Text Processing, FIRE 2016 ; Conference Date: 7 December 2016 Through 10 December 2016; Conference Code:210099

Cite this Research Publication

R. G. Devi, Veena, P. V., M. Kumar, A., and Soman, K. P., “Entity Extraction of Hindi-English and Tamil-English Code-Mixed Social Media Text”, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10478 LNCS, pp. 206-218, 2018.

207
PROGRAMS
OFFERED
6
AMRITA
CAMPUSES
15
CONSTITUENT
SCHOOLS
A
GRADE BY
NAAC, MHRD
8th
RANK(INDIA):
NIRF 2018
150+
INTERNATIONAL
PARTNERS
  • Amrita on Social Media

  • Contact us

    Amrita Vishwa Vidyapeetham,
    Amritanagar,
    Coimbatore - 641 112,
    Tamil Nadu, India.
    • Fax                 : +91 (422) 268 6274
    • Coimbatore   : +91 (422) 268 5000
    • Amritapuri    : +91 (476) 280 1280
    • Bengaluru     : +91 (080) 251 83700
    • Kochi              : +91 (484) 280 1234
    • Mysuru          : +91 (821) 234 3479
    • Chennai         : +91 (44 ) 276 02165
    • Contact Details »