Back close

Enhanced speech emotion detection using deep neural networks

Publication Type : Journal Article

Publisher : International Journal of Speech Technology

Source : International Journal of Speech Technology, Springer New York LLC (2018)

Url :

Keywords : arousal, BFCC, Cepstrum, Deep neural networks, Emotion detection, Feature extraction, Perceptual feature, Recognition accuracy, Speech recognition, Valence

Campus : Bengaluru

School : Department of Computer Science and Engineering, School of Engineering

Department : Electronics and Communication

Year : 2018

Abstract :

This paper focusses on investigation of the effective performance of perceptual based speech features on emotion detection. Mel frequency cepstral coefficients (MFCC’s), perceptual linear predictive cepstrum (PLPC), Mel frequency perceptual linear prediction cepstrum (MFPLPC), bark frequency cepstral coefficients (BFCC), revised perceptual linear prediction coefficient’s (RPLP) and inverted Mel frequency cepstral coefficients (IMFCC) are the perception features considered. The algorithm using these auditory cues is evaluated with deep neural networks (DNN). The novelty of the work involves analysis of the perceptual features to identify predominant features that contain significant emotional information about the speaker. The validity of the algorithm is analysed on publicly available Berlin database with seven emotions in 1-dimensional space termed categorical and 2-dimensional continuous space consisting of emotions in valence and arousal dimensions. Comparative analysis reveals that considerable improvement in the performance of emotion recognition is obtained using DNN with the identified combination of perceptual features. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.

Cite this Research Publication : S. Lalitha, Tripathi, S., and Gupta, D., “Enhanced speech emotion detection using deep neural networks”, International Journal of Speech Technology, 2018.

Admissions Apply Now