Back close

Title-Sum: An LLM-Based Title Generation System for Domain-Specific Documents in Malayalam Language

Publication Type : Conference Paper

Publisher : Springer Nature Singapore

Source : Lecture Notes in Networks and Systems

Url : https://doi.org/10.1007/978-981-96-6537-2_20

Campus : Chennai

School : School of Computing

Department : Computer Science and Engineering

Year : 2025

Abstract :

Automatic title generation has emerged as a significant area of interest in natural language generation as titles attract people’s attention easily. It optimizes the tasks of web searching, academia, and news headline generation. Title generation in the Malayalam language is still in its early stages, and limited significant work has been done in this area. In this work, we have addressed the title generation task in Malayalam with the social science textbook content from the Social-sum-Mal dataset as training data. Title creation of Malayalam school textbook content will improve the document's readability. An ensemble-based method using the outputs of three fine-tuned large language models—IndicBART, mT5 and MBART-50 is employed in this study. The outputs generated by the fine-tuned LLMs are passed to a scoring mechanism where they are evaluated and scored based on three criteria. This includes keyword analysis, length scoring, and cosine similarity. Based on the overall score, the best output from the candidate titles is selected as the title for the input document. The system has been rigorously evaluated using testing metrics such as ROUGE, BLEU, BERTScore, and ChatGPT-based analysis. The results demonstrate that the ensemble approach effectively generates meaningful titles.

Cite this Research Publication : M. Rahul Raj, Dhanya S. Pankaj, Title-Sum: An LLM-Based Title Generation System for Domain-Specific Documents in Malayalam Language, Lecture Notes in Networks and Systems, Springer Nature Singapore, 2025, https://doi.org/10.1007/978-981-96-6537-2_20

Admissions Apply Now