SVM Based Part of Speech Tagger for Malayalam
Publication Type:Conference Paper
Source:Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conference on, IEEE, Kochi, Kerala, p.339-341 (2010)
This paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and information extraction. This supervised machine learning POS tagging approach requires a large amount of annotated training corpus to tag properly. At initial stage of POS-tagging for Malayalam, the model is trained with a very limited resource of annotated corpus. We tried to maximize the performance with this a substantial amount of annotated corpus. The objective of this project was to identify the ambiguities in Malayalam lexical items and develop an efficient and accurate POS Tagger. We have developed our own tagset for training and testing the POS-tagger generators. The present tagset consists of 29 tags. A corpus size of one hundred and eighty thousand words was used for training and testing the accuracy of the tagger generators. We found that the result obtained was more efficient and accurate compared with earlier methods for Malayalam POS tagging.
cited By (since 1996)6; Conference of org.apache.xalan.xsltc.dom.DOMAdapter@2e8e4fd5 ; Conference Date: org.apache.xalan.xsltc.dom.DOMAdapter@11b40d4 Through org.apache.xalan.xsltc.dom.DOMAdapter@243d1edd; Conference Code:80503