Publication Type:

Conference Paper


2014 International Conference on Power Signals Control and Computations (EPSCICON) (2014)



Accession Number:




accuracy, Context, empirical algorithm, energy, ESC, Feature extraction, language processing, mel-filter Bank, neural nets, neural network, Neural networks, signal energy and spectral centroid, SM, Speaker recognition, Speaker verification, spectral cen-troid, spectral matching, Speech, speech processing, Speech recognition, split temporal context, TempoRAl PatternS, TRAPS, VAD, Voice activity detection, voice activity detector


For spoken language processing applications like speaker recognition/verification, not only that the silence segments do not contribute any speaker specific information, but also it dilutes the already available information content in the speech segments in the audio data. It has been experimentally studied that removing silence segments with the help of a voice activity detector(VAD) from the utterance before feature extraction enhances the performance of speaker recognition systems. Empirical algorithms using signal energy and spectral centroid(ESC) is one of the most popular approaches to VAD. In this paper, we show that using spectral matching (SM) to distinguish between silence and speech segments for VAD outperforms the VAD using ESC. We use a neural network with TempoRAl PatternS (TRAPS) of critical band energies as its input for improved performance. We evaluate the performance of VADs using a speaker recognition system developed for 20 speakers.

Cite this Research Publication

K. T. Sreekumar, George, K. K., Arunraj, K., and Dr. Santhosh Kumar C., “Spectral matching based voice activity detector for improved speaker recognition”, in 2014 International Conference on Power Signals Control and Computations (EPSCICON), 2014.