Dr. M. Anand Kumar currently serves as Assistant Professor at Amrita Center for Computational Engineering and Networking (CEN), Coimbatore Campus.








Invited Talk

  • “Machine Translation Tools for Grammar Teaching”, Presented at Tamil Internet Conference 2010, June 2010, Cemmozhi Maanaadu, Coimbatore, India.
  • POS tagging and Morphological analyzer, National Workshop on Computational Linguistics and Machine Translation from English to Indian Languages at AMRITA VISHWA VIDAPEETHAM on 07-10-2012.
  • Factored SMT for English-Tamil, National Workshop on “Computational Linguistics and Machine Translation from English to Indian Languages at AMRITA VISHWA VIDAPEETHAM on 07-10-2012.
  • Machine learning approach for Tamil POS tagging, Workshop on Tamil – POS Tagging at Madurai Kamarajar University on 07-03-2013.
  • Basics in Machine Translation, Course on Introduction to Translation at CIIL, Mysore on 16-12-2013.
  • Basics in Machine Translation, Course on Introduction to Translation at CIIL, Mysore on 04-11-2013.
  • Hybrid Machine Translation System, Lecture on Computer Science Topics at Vidya Academy of Science and Technology on 06-11-2013.
  • “Tamil NLP Tools and resources”, A National Level Workshop on “Complete Understanding of NLP in Tamil Language at CIET, Coimbatore on 18th July 2014.
  • Machine learning approach for morphological analyzer, FDP on Language Technology, November 17-21, 2014, Govt. Engineering College, Sreekrishnapuram, Palakkad
  • “Machine Translation and Linguistic Tools”, "ulagatamilsangam" (World Tamil Association), 02nd February - 2015 at Tamil Development Directorate, Chennai
  • “Linguistic Tools and Text Analytics for Tamil”, Tamil Virtual Academy, Anna university, Chennai, 30th October 2015



Publication Type: Journal Article
Year of Publication Publication Type Title
2016 Journal Article S. S. Kumar, M Anand Kumar, and Soman, K. P., “Experimental analysis of malayalam pos tagger using epic framework in scala”, ARPN Journal of Engineering and Applied Sciences, vol. 11, pp. 8017-8023, 2016.[Abstract]

In Natural Language Processing (NLP), one of the well-studiedproblems under constant exploration is part-ofspeech tagging or POS tagging or grammatical tagging. The task is to assign labels or syntactic categories such as noun, verb, adjective, adverb, preposition etc. to the words in a sentence or in an un-annotated corpus. This paper presents a simple machine learning based experimental study for POS tagging using a new structured prediction framework known as EPIC, developed in scale programming language. This paper is first of its kind to perform POS tagging in Indian Language using EPIC framework. In this framework, the corpus contains labelled Malayalam sentences in domains like health, tourism and general (news, stories). The EPIC framework uses conditional random field (CRF) for building tagged models. The framework provides several parameters to adjust and arrive at improved accuracy and thereby a better POS tagger model. The overall accuracy were calculated separately for each domains and obtained a maximum accuracy of 85.48%, 85.39%, and 87.35% for small tagged data in health, tourism and general domain. More »»
2015 Journal Article S. N. Vinithra, M Anand Kumar, and Soman, K. P., “Analysis of sentiment classification for Hindi movie reviews: A comparison of different classifiers”, International Journal of Applied Engineering Research, vol. 10, 2015.[Abstract]

To decide on anything in our day to day life, it is important to have an opinion. Every opinion has a sentiment which helps in carrying decisions easier. There is a huge amount of data on the web which needs to be mined in order to find its sentiment. This paper aims at classifying labelled textual Hindi movie reviews with different classifiers. The dataset has been segregated into positive and negative reviews before processing. The goal of this paper is to predict the sentiment of the online movie review which is in form of documents with varied size. A 10-fold-cross-validation is done in order to check the calibre of the classifier used. The test accuracy is checked using the F1 score considering both precision and recall. A detailed comparison of the unigram and bigram feature‟s accuracy of all the mentioned models is done. The proposed model is classified on the following classifiers Naïve Bayes, Logistic Regression and Random Kitchen Sink algorithm. Each one of these algorithms gave better accuracy when bigram was performed. Out of these four classifying algorithms, it is observed that Naive Bayes Multinomial model has the best accuracy with a 70.37%. Hence, this sentiment analysis model which is a developing big data application is suggested for industrial applications wherein predicting the sentiment is a vital component. More »»
Publication Type: Conference Paper
Year of Publication Publication Type Title
2015 Conference Paper , M Anand Kumar, and Soman, K. P., “Deep Belief Network based Part of Speech Tagger for Telugu Language”, in 2nd IC3T International Conference on Computer and Communication Technologies, 2015.
2015 Conference Paper M. S., M Anand Kumar, and Soman, K. P., “Paraphrase Detection for Tamil language using Deep learning algorithms”, in International Conference on Big Data and Cloud Computing (ICBDCC-2015), 2015.
2015 Conference Paper H. B. Barathi Ganesh, Abinaya, N., M Anand Kumar, Vinayakumar, R., and Soman, K. P., “AMRITA - CEN@NEEL : Identification and linking of twitter entities”, in CEUR Workshop Proceedings, Florence; Italy, 2015, vol. 1395, pp. 64-65.[Abstract]

A short text gets updated every now and then. With the global upswing of such micro posts, the need to retrieve information from them also seems to be incumbent. This work focuses on the knowledge extraction from the micro posts by having entity as evidence. Here the extracted entities are then linked to their relevant DBpedia source by featurization, Part Of Speech (POS) tagging, Named Entity Recognition (NER) and Word Sense Disambiguation (WSD). This short paper encompasses its contribution to #Micropost2015 - NEEL task by experimenting existing Machine Learning (ML) algorithms. Copyright © 2015 held by author(s More »»
2014 Conference Paper P. Sanjanaashree, M Anand Kumar, and Soman, K. P., “Language learning for visual and auditory learners using scratch toolkit”, in 2014 International Conference on Computer Communication and Informatics: Ushering in Technologies of Tomorrow, Today, ICCCI 2014, https://www.scopus.com/record/display.uri?eid=2-s2.0-84911391150&origin=inward&txGid=0, 2014.[Abstract]

In recent years, with the development of technology, life has become very easy. Computers have become the life line of today's high-tech world. There is no work in our whole day without the use of computers. When we focus particularly in the field of education, people started preferring to e-books than carrying textbooks. In the phase of learning, visualization plays a major role. When the visualization tool and auditory learning comes together, it brings the in-depth understanding of data and their phoneme sequence through animation and with proper pronunciation of the words, which is far better than the people learning from the textbooks and imagining in their perspective and have their own pronunciation. Scratch with its visual, block-based programming platform is widely used among high school kids to learn programming basics. We investigated that in many schools around the world uses this scratch for students to learn programming basics. Literature review shows that students find it interesting and are very curious about it. This made us anxious towards natural language learning using scratch because of its interesting visual platform. This paper is based on the concept of visual and auditory learning. Here, we described how we make use of this scratch toolkit for learning the secondary language. We also claim that this visual learning will help people remember easily than to read as texts in books and the auditory learning helps in proper pronunciation of words rather than expecting someone's help. We have developed a scratch based tool for learning simple sentence construction of secondary language through primary language. In this paper, languages used are English (secondary language) and Tamil (primary language). This is an enterprise for language learning tool in scratch. This is applicable for other language specific exercises and can be adopted easily for other languages too. © 2014 IEEE. More »»
2014 Conference Paper P. Sanjanaashree and M Anand Kumar, “Joint layer based deep learning framework for bilingual machine transliteration”, in Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2014, ICACCI 2014; Delhi; India;, 2014, pp. 1737 - 1743.[Abstract]

Between the growth of Internet or World Wide Web (WWW) and the emersion of the social networking site like Friendster, Myspace etc., information society started facing exhilarating challenges in language technology applications such as Machine Translation (MT) and Information Retrieval (IR). Nevertheless, there were researchers working in Machine Translation that deal with real time information for over 50 years since the first computer has come along. Merely, the need for translating data has become larger than before as the world was getting together through social media. Especially, translating proper nouns and technical terms has become openly challenging task in Machine Translation. The Machine transliteration was emerged as a part of information retrieval and machine translation projects to translate the Named Entities based on phoneme and grapheme, hence, those are not registered in the dictionary. Many researchers have used approaches such as conventional Graphical models and also adopted other machine translation techniques for Machine Transliteration. Machine Transliteration was always looked as a Machine Learning Problem. In this paper, we presented a new area of Machine Learning approach termed as a Deep Learning for improving the bilingual machine transliteration task for Tamil and English languages with limited corpus. This technique precedes Artificial Intelligence. The system is built on Deep Belief Network (DBN), a generative graphical model, which has been proved to work well with other Machine Learning problem. We have obtained 79.46% accuracy for English to Tamil transliteration task and 78.4 % for Tamil to English transliteration. © 2014 IEEE. More »»
2014 Conference Paper A. Aravind and M Anand Kumar, “Machine learning approach for correcting preposition errors using SVD features”, in Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2014, 2014.
2014 Conference Paper M Anand Kumar, V., D., Soman K. P., and V., S., “Improving the Performance of English-Tamil Statistical Machine Translation System using Source-Side Pre-Processing”, in Proceedings of International Conference on Advances in Computer Science, AETACS, 2014.[Abstract]

Machine Translation is one of the major oldest and the most active research area in Natural Language Processing. Currently, Statistical Machine Translation (SMT) dominates the Machine Translation research. Statistical Machine Translation is an approach to Machine Translation which uses models to learn translation patterns directly from data, and generalize them to translate a new unseen text. The SMT approach is largely language independent, i.e. the models can be applied to any language pair. Statistical Machine Translation (SMT) attempts to generate translations using statistical methods based on bilingual text corpora. Where such corpora are available, excellent results can be attained translating similar texts, but such corpora are still not available for many language pairs. Statistical Machine Translation systems, in general, have difficulty in handling the morphology on the source or the target side especially for morphologically rich languages. Errors in morphology or syntax in the target language can have severe consequences on meaning of the sentence. They change the grammatical function of words or the understanding of the sentence through the incorrect tense information in verb. Baseline SMT also known as Phrase Based Statistical Machine Translation (PBSMT) system does not use any linguistic information and it only operates on surface word form. Recent researches shown that adding linguistic information helps to improve the accuracy of the translation with less amount of bilingual corpora. Adding linguistic information can be done using the Factored Statistical Machine Translation system through pre-processing steps. This paper investigates about how English side pre-processing is used to improve the accuracy of English-Tamil SMT system.

More »»
2014 Conference Paper , ,, M Anand Kumar, and Soman, K. P., “AMRITA@ FIRE-2014: Named Entity Recognition for Indian languages (Working notes)”, in International Workshop: "NER shared Task" Forum for Information Retrieval Evaluation (FIRE-2014), Bengaluru, 2014.
2014 Conference Paper M Anand Kumar, Rajendran, S., and Soman, K. P., “AMRITA@ FIRE-2014: Morpheme Extraction for Tamil using Machine Learning (Working notes)”, in International Workshop: "MET shared Task" Forum for Information Retrieval Evaluation (FIRE- 2014), Bengaluru , 2014.
2011 Conference Paper M Anand Kumar, “Morphological Generator for Tamil”, in National Seminar on Computational Linguistics and Language Technology, Annamalai University,Chidambaram, 2011.
2011 Conference Paper R. Dhivya, Dhanalakshmi, V., M Anand Kumar, and Soman, K. P., “Clause Boundary Identification For Tamil Language Using Dependency Parsing - SPIT2011”, in International Joint Conference on Advances in Signal Processing and Information Technology – SPIT 2011, 2011.
2010 Conference Paper M Anand Kumar, Dhanalakshmi, V. V., Rajendran, S., Soman, K. P., and Rekha, K. U., “A novel algorithm for Tamil morphological generator (Best Second Paper)”, in 8th International Conference on Natural Language Processing ( ICON2010), IIT Kharagpur, India, 2010.[Abstract]

Tamil is a morphologically rich language with agglutinative nature. Being agglutinative language most of the word features are postpositionally affixed to the root word. The morphological generator takes lemma, POS category and morpholexical description as input and gives a word-form as output. It is a reverse process of morphological analyzer. In any natural language generation system, morphological generator is an essential component in post processing stage. Morphological generator system implemented here is based on a new algorithm, which is simple, efficient and does not require any rules and morpheme dictionary. A paradigm classification is done for noun and verb based on S.Rajendran’s paradigm classification. Tamil verbs are classified into 32 paradigms with 1884 inflected forms. Like verbs, nouns are classified into 25 paradigms with 325 word forms. This approach requires only minimum amount of data. So this approach can be easily implemented to less resourced and morphologically rich languages. More »»
2008 Conference Paper D. V., M Anand Kumar, S., V. M., R., L., Soman K. P., and S., R., “Tamil Part-of-Speech Tagger based on SVM Tool”, in Proceedings of International Conference on Asian Language Processing 2008 (IALP 2008), Chiang Mai, Thailand, 2008.
Faculty Details


Faculty Email: