Back close

Langtool: Identification of Indian Language for short Text

Publication Type : Conference Paper

Publisher : 9th International Conference on Advanced Computing

Source : 9th International Conference on Advanced Computing (ICoAC 2017), MIT, Chennai (2017)

Campus : Bengaluru

School : Department of Computer Science and Engineering, School of Engineering

Department : Computer Science

Year : 2017

Abstract : Language identification is used to categorize the language of a given document. Language identification categorizes the contents and can have a better search results for a multilingual document. In this work, we classify each line of text to a particular language and focused on short phrases of length 2–6 words for 15 Indian languages. It detects that a given document is in multilingual and identifies the appropriate Indian languages. The approach used is the combination of n-gram technique and a list of short distinctive words. The n-gram model applied is language independent whereas short word method uses less computation. The results show the effectiveness of our approach over the synthetic data.

Cite this Research Publication : S. Bhaskaran, Paul, G., Dr. Deepa Gupta, and Amudha, J., “Langtool: Identification of Indian Language for short Text”, in 9th International Conference on Advanced Computing (ICoAC 2017), MIT, Chennai , 2017.

Admissions Apply Now