Publication Type:

Journal Article

Source:

Pertanika Journal of Social Science and Humanities, Volume 22, Number 4, p.1045-1061 (2014)

URL:

https://www.scopus.com/inward/record.uri?eid=2-s2.0-84921306357&partnerID=40&md5=aabc37889bc1a84e4f7fbbc8536785e4

Abstract:

This paper proposes a morphology based Factored Statistical Machine Translation (SMT) system for translating English language sentences into Tamil language sentences. Automatic translation from English into morphologically rich languages like Tamil is a challenging task. Morphologically rich languages need extensive morphological pre-processing before the SMT training to make the source language structurally similar to target language. English and Tamil languages have disparate morphological and syntactical structure. Because of the highly rich morphological nature of the Tamil language, a simple lexical mapping alone does not help for retrieving and mapping all the morpho-syntactic information from the English language sentences. The main objective of this proposed work is to develop a machine translation system from English to Tamil using a novel pre-processing methodology. This pre-processing methodology is used to pre-process the English language sentences according to the Tamil language. These pre-processed sentences are given to the factored Statistical Machine Translation models for training. Finally, the Tamil morphological generator is used for generating a new surface word-form from the output factors of SMT. Experiments are conducted with nine different type of models, which are trained, tuned and tested with the help of general domain corpora and developed linguistic tools. These models are different combinations of developed pre-processing tools with baseline models and factored models and the accuracies are evaluated using the well known evaluation metric BLEU and METOR. In addition, accuracies are also compared with the existing online "Google-Translate" machine translation system. Results show that the proposed method significantly outperforms the other models and the existing system. © Universiti Putra Malaysia Press

Notes:

cited By 0

Cite this Research Publication

A. M. Kumar, Dhanalakshmi, V., Soman, K. P., and Rajendran, S., “Factored statistical machine translation system for English to Tamil language”, Pertanika Journal of Social Science and Humanities, vol. 22, pp. 1045-1061, 2014.