Back close

Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level

Publication Type : Journal Article

Publisher : ACM Trans. Asian Low-Resour. Lang. Inf. Process., Association for Computing Machinery,

Source : ACM Trans. Asian Low-Resour. Lang. Inf. Process., Association for Computing Machinery, Volume 20, Number 6, New York, NY, USA (2021)

Url : https://doi.org/10.1145/3457976

Keywords : bidirectional RNN, Conditional random field, gated recurrent unit, long short-term memory networks, Morphological generation, Recurrent neural networks, stacked RNN

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Electronics and Communication

Year : 2021

Abstract : Morphological synthesis is one of the main components of Machine Translation (MT) frameworks, especially when any one or both of the source and target languages are morphologically rich. Morphological synthesis is the process of combining two words or two morphemes according to the Sandhi rules of the morphologically rich language. Malayalam and Tamil are two languages in India which are morphologically abundant as well as agglutinative. Morphological synthesis of a word in these two languages is challenging basically because of the following reasons: (1) Abundance in morphology; (2) Complex Sandhi rules; (3) The possibilty in Malayalam to form words by combining words that belong to different syntactic categories (for example, noun and verb); and (4) The construction of a sentence by combining multiple words. We formulated the task of the morphological generation of nouns and verbs of Malayalam and Tamil as a character-to-character sequence tagging problem. In this article, we used deep learning architectures like Recurrent Neural Network (RNN), Long Short-Term Memory Networks (LSTM), Gated Recurrent Unit (GRU), and their stacked and bidirectional versions for the implementation of morphological synthesis at the character level. In addition to that, we investigated the performance of the combination of the aforementioned deep learning architectures and the Conditional Random Field (CRF) in the morphological synthesis of nouns and verbs in Malayalam and Tamil. We observed that the addition of CRF to the Bidirectional LSTM/GRU architecture achieved more than 99% accuracy in the morphological synthesis of Malayalam and Tamil nouns and verbs.

Cite this Research Publication : B. Premjith and Dr. Soman K. P., “Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level”, ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 20, 2021.

Admissions Apply Now