Publication Type : Journal Article
Publisher : International Journal of Applied Engineering Research
Source : International Journal of Applied Engineering Research, Volume 11, Issue 8, p.5425-5429 (2016)
Keywords : IE, Malayalam training corpus, MEMM, NE, NER, TnT
School : School of Arts and Sciences
Verified : Yes
Year : 2016
Abstract : Information Extraction is the process of extracting the relevant data from the given text documents. It is one of the widely used research areas in Natural language processing. Named Entity Recognition(NER) deals with recognizing named entities(NE) such as person, organization, location in the text documents. In the existing system, TnT tagger is used for named entity recognition. The drawback is that there occurs a problem while handling the unknown words. Maximum entropy markov models have a drawback. It potentially suffer from the label bias problem. The lowentropy transition distributions effectively ignore their observations. This work implements a named entity recognizer for Malayalam language using Maximum Entropy Markov Model (MEMM). It combines the features of Hidden Markov models and Maximum entropy. It represents all these observations as arbitrary features such as capitalization, word, PoS, formatting). Input data for proposed Named Entity Recognition system is any text document related to the any domain in Malayalam language. Trigrams‟n‟tags (TnT) tagger is used for parts of speech (POS) tagging. The significance of the work is that it helps in smoothing and handling unknown words. The system is experimented with more than thousand sentences. An accuracy of 82. 5% is obtained for the proposed methodology. © Research India Publications.
Cite this Research Publication : S. S., Jiljo,, and P.V., P., “A study on named entity recognition for malayalam language using tnt tagger & maximum entropy markov model”, International Journal of Applied Engineering Research, vol. 11, no. 8, pp. 5425-5429, 2016.