Back close

Speech Emotion Recognition Using CNN-LSTM and Vision Transformer

Publication Type : Book Chapter

Publisher : SpringerLink

Source : In International Conference on Innovations in Bio-Inspired Computing and Applications

Url :

Campus : Coimbatore

School : School of Artificial Intelligence - Coimbatore

Center : Center for Computational Engineering and Networking

Year : 2022

Abstract : The importance of speech emotion recognition has increased as a result of the acceptance of intelligent conversational assistant services. The communication between humans and machines may be made better via emotion recognition and analysis. We propose the application of attention based deep learning techniques to process and recognize speech emotions. In this paper we look at two major approaches CNN-LSTM and Mel Spectrogram-Vision Transformer based models and is compared over to the existing benchmarks. The experimental results roots for the feature extraction strategy of deep learning based approaches, eliminating the need of handpicking the features for traditional machine learning (ML) classifiers present in the current literature. A comparative study and evaluation between CNN-LSTM and Vision Transformers (ViT) have been evaluated and established from the experimental results. Both the models performed similarly with CNN-LSTM giving an accuracy of 88.50% when compared to the accuracy of 85.36% by ViT surpassing the existing benchmarks and providing the scope of study of attention and image processing based learning for speech emotion recognition.

Cite this Research Publication : Kumar, CS Ayush, Advaith Das Maharana, Srinath Murali Krishnan, Sannidhi Sri Sai Hanuma, G. Jyothish Lal, and Vinayakumar Ravi. "Speech Emotion Recognition Using CNN-LSTM and Vision Transformer." In International Conference on Innovations in Bio-Inspired Computing and Applications, pp. 86-97. Cham: Springer Nature Switzerland, 2022.

Admissions Apply Now