Factored statistical machine translation system for English to Tamil language

Publication Type : Journal Article

Thematic Areas : Center for Computational Engineering and Networking (CEN)

Publisher : Pertanika Journal of Social Science and Humanities

Source : Pertanika Journal of Social Science and Humanities, Volume 22, Number 4, p.1045-1061 (2014)

Url : https://www.scopus.com/inward/record.uri?eid=2-s2.0-84921306357&partnerID=40&md5=aabc37889bc1a84e4f7fbbc8536785e4

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Electronics and Communication

Year : 2014

Abstract : This paper proposes a morphology based Factored Statistical Machine Translation (SMT) system for translating English language sentences into Tamil language sentences. Automatic translation from English into morphologically rich languages like Tamil is a challenging task. Morphologically rich languages need extensive morphological pre-processing before the SMT training to make the source language structurally similar to target language. English and Tamil languages have disparate morphological and syntactical structure. Because of the highly rich morphological nature of the Tamil language, a simple lexical mapping alone does not help for retrieving and mapping all the morpho-syntactic information from the English language sentences. The main objective of this proposed work is to develop a machine translation system from English to Tamil using a novel pre-processing methodology. This pre-processing methodology is used to pre-process the English language sentences according to the Tamil language. These pre-processed sentences are given to the factored Statistical Machine Translation models for training. Finally, the Tamil morphological generator is used for generating a new surface word-form from the output factors of SMT. Experiments are conducted with nine different type of models, which are trained, tuned and tested with the help of general domain corpora and developed linguistic tools. These models are different combinations of developed pre-processing tools with baseline models and factored models and the accuracies are evaluated using the well known evaluation metric BLEU and METOR. In addition, accuracies are also compared with the existing online Google-Translate machine translation system. Results show that the proposed method significantly outperforms the other models and the existing system. © Universiti Putra Malaysia Press

Cite this Research Publication : A. M. Kumar, Dhanalakshmi, V., Dr. Soman K. P., and S. Rajendran, “Factored statistical machine translation system for English to Tamil language”, Pertanika Journal of Social Science and Humanities, vol. 22, pp. 1045-1061, 2014.

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Publication