Attention based Multi Modal Learning for Audio Visual Speech Recognition

Publication Type : Conference Paper

Publisher : IEEE

Source : International Conference on Artificial Intelligence and Speech Technology (AIST), Delhi, India, 2022, pp. 1-4, doi: 10.1109/AIST55798.2022.10065019: IEEE Xplore

Url : https://ieeexplore.ieee.org/document/10065019

Campus : Coimbatore

School : School of Computing

Year : 2022

Abstract : In recent years, multimodal fusion using deep learning has proliferated in various tasks such as emotion recognition, and speech recognition by drastically enhancing the performance of the overall system. However, the existing unimodal audio speech recognition system has various challenges in handling ambient noise, and varied pronunciations, and is inaccessible to hearing-impaired people. To address these limitations in audio-based speech recognizers, this paper exploits an idea of an intermediary level fusion framework using multimodal information from audio as well as visual movements. We analyzed the performance of the transformer-based audio-visual model for noisy audio. We accessed the model across two benchmark datasets namely LRS2 and Grid. Overall, we identified that multimodal learning for speech offers a better WER compared to other baseline systems.

Cite this Research Publication : A. Kumar, D. K. Renuka, S. L. Rose and M. C. Shunmugapriya, "Attention based Multi Modal Learning for Audio Visual Speech Recognition," 2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST), Delhi, India, 2022, pp. 1-4, doi: 10.1109/AIST55798.2022.10065019: IEEE Xplore

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Research

Centers

Patents

Publication