Publication Type:

Conference Paper

Source:

2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2018)

URL:

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8554758&tag=1

Keywords:

active newton set algorithm, ASNA, Buildings, classification method, clean speech, Dictionaries, dictionary atoms, Dictionary learning, dictionary learning algorithms, Distortion, Encoding, Fourier transforms, k-mediod, K-Medoid, learning (artificial intelligence), Machine learning, Matrix decomposition, Newton method, noise dictionaries, noisy speech, nonzero weights, optimal considering tradeoff, short-term fourier transform, signal classification, signal recovery, signal-distortion-ratio, signal-to-noise ratio, sNMF, SNR, sparse coding algorithms, sparse NMF, Speech recognition, speech recognition task, speech templates, spoken word recognition task, STFT features, sup NMF, supervised nonnegative matrix factorization, Task analysis, template based recognition, Training, word class, word templates

Abstract:

In this paper, different dictionary learning and sparse coding algorithms are studied namely k-medoid, sparse non-negative matrix factorization (sNMF), active newton set algorithm (ASNA) and supervised non-negative matrix factorization (sup. NMF) for spoken word recognition task. The number of dictionary atoms per word class determined as 100 based on our proposed approach is empirically verified to be optimal considering tradeoff between accuracy and time of computation. The recognition task involved classification (matching) of sparsely coded word templates generated using learned dictionaries. A classification method based on the sum of non-zero weights of the coded representation is proposed for word recognition. The method does not require building class specific models making it apt for template based recognition. Moreover, it does not require signal recovery unlike most of the existing template based methods that performs matching on templates derived from recovered signal. The representation generated using magnitude short-term fourier transform (STFT) features, sNMF for dictionary learning and sup. NMF for sparse coding in conjunction with our proposed method of classification is found to exhibit an accuracy of 88.82% on clean speech, the highest among those studied. This outperformed the accuracy figures of the corresponding methods based on signal recovery such as signal-distortion-ratio. For noisy speech at different signal-to-noise ratio (SNR) levels, the combined dictionary comprising of separately trained speech and noise dictionaries provided the best recognition accuracy of 35.5% at zero dB SNR.

Cite this Research Publication

K. S. Kiran, Mandal, A., Kumar, K. R. Prasann, Mitra, P., and S. Veni, “A Comparative Study of Dictionary Learning Algorithms on Speech Recognition Task”, in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018.