Part of speech (POS) tagging is the process of labeling a part of speech tag to each and every word in the corpus. In this paper POS tagging for Tamil language is performed by using Bidirectional Long Short Term Memory. A C2W (character to word) model instead of traditional word lookup table for obtaining word embeddings using BLSTM is presented. The C2W model uses characters to form a vector representation of a word. The word embedding from C2W model is used by BLSTM to tag the words in the corpus. This method, when tested with 3723 words produced highest accuracy of 86.45%. © International Science Press.
cited By 0
K. S. Gokul Krishnan, Pooja, A., Dr. M. Anand Kumar, and Dr. Soman K. P., “Character based bidirectional LSTM for disambiguating tamil part of speech categories”, International Journal of Control Theory and Applications, vol. 10, pp. 229-235, 2017.