Back close

Realistic Lip-Sync Generation from Text for Multimodal Applications

Publication Type : Conference Proceedings

Publisher : IEEE

Source : 2025 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI)

Url : https://doi.org/10.1109/iatmsi64286.2025.10984895

Campus : Bengaluru

School : School of Engineering

Department : Electronics and Communication

Year : 2025

Abstract : This paper explores an advanced framework for text-to-lip-sync generation, leveraging the integration of the Massively Multilingual Speech Text-to-Speech (MMS TTS) and Wav2Lip models. The MMS TTS model transforms text into natural, high-quality multilingual speech while preserving speaker identity, and the Wav2Lip model ensures accurate synchronization of lip movements with the generated audio. Using the GRID dataset for training, the Wav2Lip model achieves remarkable performance in audiovisual alignment, attaining high lip-sync confidence metrics with an LSE-C score of 6.701 and an LSE-D score of 7.362. The proposed solution has broad applicability in virtual assistants, media production, digital education, and accessibility, setting a new standard for immersive and realistic audiovisual experiences.

Cite this Research Publication : Doradla Kaushik, Konduru Praveen Karthik, Taduvai Satvik Gupta, Susmitha Vekkot, Realistic Lip-Sync Generation from Text for Multimodal Applications, 2025 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), IEEE, 2025, https://doi.org/10.1109/iatmsi64286.2025.10984895

Admissions Apply Now