Publication Type:

Journal Article

Source:

ARPN Journal of Engineering and Applied Sciences, Volume 10, Issue 8, Number 8, p.3702-3707 (2015)

URL:

https://www.scopus.com/inward/record.uri?eid=2-s2.0-84929380094&partnerID=40&md5=873de1521131ea3f358df5a96bf17101

Abstract:

The Era of digitization induces the need of domainclassification in both the on-line and off-line applications. The necessity of automatic text classification arises for utilizing it in diverse fields. Hence various methodologies like Machine Learningalgorithms were proposed to do the same. Here automatic document classification of Tamil documents have been proposed by considering the exponential growth of Tamil text documents in the form of unstructured data available as News, Encyclopedias, E-books, E-Governance, Social Media and much more. Max-Ent, CRF and SVM algorithms are used here to achieve more than 90 percentage average accuracy in both the sentence and document level classification of Tamil text documents. In this work Dinakarannewspaper dataset from EMILLE/CIIL Corpus has been utilized to experiment the ability of Machine Learning algorithms in Tamil domain classification. © 2006-2015 Asian Research Publishing Network (ARPN).

Cite this Research Publication

U. Reshma, Ganesh, H. B. Barathi, M. Kumar, A., and Soman, K. P., “Supervised methods for domain classification of tamil documents”, ARPN Journal of Engineering and Applied Sciences, vol. 10, no. 8, pp. 3702-3707, 2015.