Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral domain features for short time frames of 20 ms. size along with their mean and standard deviation for segments of size 200 ms. We then analysed if selecting a subset of the features can help improve the performance of the SND system. Towards this, we experimented with different feature selection algorithms, and observed that correlation based feature selection gave the best results. Further, we experimented with different decision tree classification algorithms, and note that random forest algorithm outperformed other decision tree algorithms. We further improved the SND system performance by smoothing the decisions over 5 segments of 200 ms. each. Our baseline system has 272 features, a classification accuracy of 94.45 % and the final system with 8 features has a classification accuracy of 97.80 %.
S. V. Thambi, Sreekumar, K. T., Dr. Santhosh Kumar C., and Raj, P. C. R., “Random forest algorithm for improving the performance of speech/non-speech detection”, in 2014 First International Conference on Computational Systems and Communications (ICCSC), 2014.