Prosodic Transformation in Vocal Emotion Conversion for Multi-lingual Scenarios: A Pilot Study

Publication Type : Journal Article

Publisher : International Journal of Speech Technology

Source : International Journal of Speech Technology, Volume 22, Issue 3, p.533-549 (2019)

Campus : Bengaluru

School : Department of Computer Science and Engineering, School of Engineering

Department : Computer Science, Electronics and Communication

Year : 2019

Abstract : The primary objective of this work is to compare patterns for vocal expression across distinct linguistic contexts. Five language (datasets) are taken for experimentation viz. German (EmoDB), English (SAVEE), and Indian languages: Telugu (IITKGP), Malayalam and Tamil, each varying systematically with reference to typology and linguistic proximity. The hypothesis put forth for experimentation is that though the languages selected exploit the prosodic parameters in distinct measure to express a set of basic emotions, viz. anger, fear and happiness, there exist certain underlying similarities in terms of prosodic perception. A methodology for estimating and incorporating supra-segmental parameters contributing to emotional expression viz. pitch, duration and intensity is developed and tested against all five datasets. The main contribution in this work is the use of same prosodic transformation scales for emotion conversion across multi-lingual test cases for generation of vocal affect in multiple languages. Objective evaluation revealed maximum correlation for anger expression synthesised by adapting transformation scales from Tamil (0.95), that for fear from Telugu (0.89) while for happiness, scales from English dataset yielded superior conversion results (0.94). They are re-emphasised with perception test using comparative mean opinion scores (CMOS). Maximum CMOS of 3.8 is obtained for anger and fear emotions while conversion to happiness yielded a score of 3.3. Experimental findings indicate that though significant information embedded in prosodic parameters is dependent on language structure, common trends can be observed across certain languages in the context of emotion perception which can provide insights into development of emotion conversion systems in a multilingual context.

Cite this Research Publication : Susmitha Vekkot and Gupta, D., “Prosodic Transformation in Vocal Emotion Conversion for Multi-lingual Scenarios: A Pilot Study”, . International Journal of Speech Technology, vol. 22, no. 3, pp. 533-549, 2019.

