Langtool: Identification of Indian Language for short Text

Publication Type : Conference Paper

Publisher : 9th International Conference on Advanced Computing

Source : 9th International Conference on Advanced Computing (ICoAC 2017), MIT, Chennai (2017)

Campus : Bengaluru

School : Department of Computer Science and Engineering, School of Engineering

Department : Computer Science

Year : 2017

Abstract : Language identification is used to categorize the language of a given document. Language identification categorizes the contents and can have a better search results for a multilingual document. In this work, we classify each line of text to a particular language and focused on short phrases of length 2–6 words for 15 Indian languages. It detects that a given document is in multilingual and identifies the appropriate Indian languages. The approach used is the combination of n-gram technique and a list of short distinctive words. The n-gram model applied is language independent whereas short word method uses less computation. The results show the effectiveness of our approach over the synthetic data.

Cite this Research Publication : S. Bhaskaran, Paul, G., Dr. Deepa Gupta, and Amudha, J., “Langtool: Identification of Indian Language for short Text”, in 9th International Conference on Advanced Computing (ICoAC 2017), MIT, Chennai , 2017.

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Publication