Back close

Advancing ASR for Indian-Accented English: Dataset Creation and Whisper Fine-Tuning

Publication Type : Conference Proceedings

Publisher : Elsevier BV

Source : Procedia Computer Science

Url : https://doi.org/10.1016/j.procs.2025.04.513

Keywords : Automatic speech recognition, Indian accent, Whisper, Word Error Rate

Campus : Bengaluru

School : School of Engineering

Department : Electronics and Communication

Year : 2025

Abstract : Despite advancements in Automatic Speech Recognition (ASR) technology, accurately transcribing Indian-accented English, remains a significant challenge. The main challenge associated with the transcription of Indian English is the lack of curated datasets covering a wide range of regional accents in the Indian sub-continent. Addressing this issue, this paper concentrates on building and testing a diverse dataset that captures the nuances of Indian-accented English, covering various regions and dialects across India. In Phase 1, data was collected from over 200 speakers, yielding 70 hours of speech data using custom-made healthcare transcripts in Telugu, Hindi, Kannada, Marathi, Tamil, and Malayalam. Phase 2 data collection include 100 hours of data from around 400 speakers, with transcripts derived from novels, newspapers, books, and online articles across Tamil Nadu, Karnataka, Andhra Pradesh, Maharashtra, Madhya Pradesh, Delhi, Assam, Manipur, and Rajasthan. The comprehensive dataset, spanning a total of 633 speakers, with around 170 hours of data was collected in two phases to understand the variations in pronunciation, intonation, and phonetic emphasis characteristic of the Indian accent. Further, the training was conducted by fine-tuning the existing Whisper ASR model to enhance its performance for Indian-accented English. Our results show that the fine-tuned Whisper-Tiny model achieved a Word Error Rate (WER) of 18.141%, Whisper-Small achieved 17.36%, and Whisper-Medium achieved 15.08%, demonstrating a significant improvement in recognizing and transcribing Indian-accented English.

Cite this Research Publication : Jaswanth Kunisetty, Pranav Ramachandrula, Sruthi S, Susmitha Vekkot, Deepa Gupta, Advancing ASR for Indian-Accented English: Dataset Creation and Whisper Fine-Tuning, Procedia Computer Science, Elsevier BV, 2025, https://doi.org/10.1016/j.procs.2025.04.513

Admissions Apply Now