Back close

Multi-speaker Speech Processing in Noisy Environments: A Hybrid Model for Source Separation and Summarization

Publication Type : Conference Paper

Publisher : Springer Nature Singapore

Source : Lecture Notes in Electrical Engineering

Url : https://doi.org/10.1007/978-981-96-9967-4_12

Campus : Bengaluru

School : School of Engineering

Department : Electronics and Communication

Year : 2025

Abstract : This work presents an advanced pipeline to first separate audio and then give a summary of the conversation. The proposed model combines SepFormer, ConvTasNet, and adaptive noise reduction techniques to isolate speech from two-speaker mixed audio, reduce background noise, and amplify the primary speaker’s voice. This hybrid approach gives better results than each of the two models used on their own, without significant increase in computational cost. Once trained, the system delivers rapid, accurate audio separation and transcription. Performance evaluation is done using standard metrics, including Signal-to-Distortion Ratio (SDR), Signal-to-Interference Ratio (SIR), and Signal-to-Artefacts Ratio (SAR) and Scale-Invariant SNR (SI-SNR) and it demonstrates the effectiveness of the proposed model. The model yields an average SDR, SIR, SAR and SI-SNR of 24.6, 24.5, 24.5 and 21.9935 respectively which shows its capability in improving speech clarity while maintaining efficiency.

Cite this Research Publication : Satvik Raghav, B. M. Vikhyath, Raja Karthikeya, S. Lalitha, Multi-speaker Speech Processing in Noisy Environments: A Hybrid Model for Source Separation and Summarization, Lecture Notes in Electrical Engineering, Springer Nature Singapore, 2025, https://doi.org/10.1007/978-981-96-9967-4_12

Admissions Apply Now