Significance of epoch identification accuracy in prosody modification for effective emotion conversion

Publisher : Communications in Computer and Information Science

Year : 2019

Abstract :

Estimating the accurate pitch marks for prosody modification is an essential step in the epoch based time and pitch scale (prosody) modification of a given speech. In epoch based prosody modification, the perceptual quality of the time and pitch scale modified speech depends on the accuracy with which glottal closure instants (epochs) are estimated. The objective of the present work is to improve the perceptual quality of the prosody modified speech by accurately estimating the epochs location. In the present work the effectiveness of variational mode decomposition (VMD) in spectral smoothing and wavelet synchrosqueezing transform (WSST) in time-frequency sharpening of a given signal is exploited for refining the zero frequency filtering (ZFF) method which is one of the simple and popular epoch extraction method. The proposed refinements to the ZFF method found to provide improved epoch estimation performance on emotive speech utterances where the conventional ZFF method show severe degradation due to rapid pitch variations. Improved mean opinion scores are obtained based on the subjective evaluation tests performed on the prosody modified speech with the epochs estimated using the refined ZFF method. The reason for improved perceptual quality in the prosody modified speech is the better identification accuracy of the estimated epochs using the proposed method as compared to the conventional ZFF method in the case of emotive speech signals. © 2019, Springer Nature Singapore Pte Ltd.

