Publication Type : Book Chapter
Publisher : Advances in Natural Language Processing, Springer Berlin Heidelberg
Source : Advances in Natural Language Processing, Springer Berlin Heidelberg, p.368–379 (2006)
Url : http://link.springer.com/chapter/10.1007/11816508_38
Campus : Bengaluru
School : Department of Computer Science and Engineering, School of Engineering
Department : Computer Science, Mathematics
Year : 2006
Abstract : This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability.
Cite this Research Publication : A. De Gispert, Dr. Deepa Gupta, Popović, M., Lambert, P., Mariño, J. B., Federico, M., Ney, H., and Banchs, R., “Improving statistical word alignments with morpho-syntactic transformations”, in Advances in Natural Language Processing, Springer Berlin Heidelberg, 2006, pp. 368–379.