Publication Type:

Conference Paper

Source:

2014 International Conference on Power Signals Control and Computations (EPSCICON) (2014)

ISBN:

9781479936120

URL:

http://ieeexplore.ieee.org/abstract/document/6887506/

Keywords:

Adaptation models, baseline system, Feature extraction, feature warping, forensic applications, Gaussian Mixture Model, Gaussian mixture model supervectors, Gaussian processes, handset regular phone, Maximum a Posteriori(MAP) Adaptation, Mel frequency cepstral coefficient, Mel frequency cepstral coefficients, MFCC, NAP, National security, nuisance attribute projection, RASTA filtering, RASTA processing, regular phone, regular phone headphone, relative spectra, speaker phone, Speaker recognition, Speaker Recognition System, Speech, Support Vector Machine, Support vector machines, text analysis, text-language independent speaker recognition systems, Training data, UBM-SVM, universal background model and support vector machines, VAD

Abstract:

Speaker Recognition is an active area of research for the last few decades for its applications in several national security, and other forensic applications. In this work, we present the details of a speaker recognition system developed using universal background model and support vector machines(UBM-SVM). We explored several techniques to improve the performance of the baseline system developed using mel frequency cepstral coefficients(MFCC) as input features. We developed and tested the speaker recognition system for 200 speakers, using the data collected over 13 different channels, such as handset regular phone, speaker phone, regular phone headphone, regular phone, etc. We experimented with the use of RelAtive SpecTrA (RASTA) processing, and feature warping on the input MFCC features, and nuisance attribute projection (NAP) on the Gaussian mixture model supervectors derived in the system. It was seen that these techniques have helped improve the system performance significantly by minimizing the effect of different channels on the system performance. The details of the system implementation and results are presented in this paper. The complete system is developed in MATLAB and C/C++.

Cite this Research Publication

K. K. George, Arunraj, K., Sreekumar, K. T., Dr. Santhosh Kumar C., and Ramachandran, K. I., “Towards improving the performance of text/language independent speaker recognition systems”, in 2014 International Conference on Power Signals Control and Computations (EPSCICON), 2014.

207
PROGRAMS
OFFERED
6
AMRITA
CAMPUSES
15
CONSTITUENT
SCHOOLS
A
GRADE BY
NAAC, MHRD
8th
RANK(INDIA):
NIRF 2018
150+
INTERNATIONAL
PARTNERS