Back close

BERT-Based Sequence Labelling Approach for Dependency Parsing in Tamil

Publication Type : Conference Proceedings

Source : Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, (Dublin, Ireland, 1-8), 2022

Url : https://aclanthology.org/2022.dravidianlangtech-1.1/

Campus : Coimbatore

School : School of Engineering

Verified : No

Year : 2022

Abstract : Dependency parsing is a method for doing surface-level syntactic analysis on natural language texts. The scarcity of any viable tools for doing these tasks in Dravidian Languages has introduced a new line of research into these topics. This paper focuses on a novel approach that uses word-to-word dependency tagging using BERT models to improve the malt parser performance. We used Tamil, a morphologically rich and free word language. The individual words are tokenized using BERT models and the dependency relations are recognized using Machine Learning Algorithms. Oversampling algorithms such as SMOTE (Chawla et al., 2002) and ADASYN (He et al., 2008) are used to tackle data imbalance and consequently improve parsing results. The results obtained are used in the malt parser and this can be accustomed to further highlight that feature-based approaches can be used for such tasks.

Cite this Research Publication : C S Ayush Kumar, Advaith Maharana, Srinath Murali, Premjith B, Soman Kp "BERT-Based Sequence Labelling Approach for Dependency Parsing in Tamil", Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, (Dublin, Ireland, 1-8), 2022

Admissions Apply Now