The paper deals with speech emotion conversion using Waveform Similarity Overlap Add (WSOLA) and subsequent linear prediction analysis for spectral transformation. Duration modification is done by taking the ratio between segment durations of neutral and target speech. After performing modification using WSOLA, the duration modified source speech is time aligned with target and further subjected to linear prediction analysis to yield the LP coefficients. The target emotion is re-synthesised by using the prosody manipulated residual and LPCs from source. The waveform similarity property of WSOLA is exploited to give output with minimal distortion. The proposed algorithm is subjectively and objectively evaluated along with popular TD-PSOLA algorithm. The correlation between synthesised and real target shows an average improvement of 60% across all emotions with the proposed technique. © Springer International Publishing AG 2017.
cited By 0; Conference of 19th International Conference on Speech and Computer, SPECOM 2017 ; Conference Date: 12 September 2017 Through 16 September 2017; Conference Code:197479
S. Vekkot and Tripathi, S., “Vocal emotion conversion using wsola and linear prediction”, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10458 LNAI, pp. 777-787, 2017.