Back close

Supervised methods for domain classification of tamil documents

Publication Type : Journal Article

Thematic Areas : Center for Computational Engineering and Networking (CEN)

Publisher : Research India Publications

Source : ARPN Journal of Engineering and Applied Sciences, Volume 10, Issue 8, Number 8, p.3702-3707 (2015)

Url :

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Electronics and Communication

Year : 2015

Abstract : The Era of digitization induces the need of domainclassification in both the on-line and off-line applications. The necessity of automatic text classification arises for utilizing it in diverse fields. Hence various methodologies like Machine Learningalgorithms were proposed to do the same. Here automatic document classification of Tamil documents have been proposed by considering the exponential growth of Tamil text documents in the form of unstructured data available as News, Encyclopedias, E-books, E-Governance, Social Media and much more. Max-Ent, CRF and SVM algorithms are used here to achieve more than 90 percentage average accuracy in both the sentence and document level classification of Tamil text documents. In this work Dinakarannewspaper dataset from EMILLE/CIIL Corpus has been utilized to experiment the ability of Machine Learning algorithms in Tamil domain classification. © 2006-2015 Asian Research Publishing Network (ARPN).

Cite this Research Publication : U. Reshma, Ganesh, H. B. Barathi, M. Kumar, A., and Dr. Soman K. P., “Supervised methods for domain classification of tamil documents”, ARPN Journal of Engineering and Applied Sciences, vol. 10, no. 8, pp. 3702-3707, 2015.

Admissions Apply Now