Publication Type : Journal Article
Publisher : The Intelligent Networks and Systems Society
Source : International Journal of Intelligent Engineering and Systems
Url : https://doi.org/10.22266/ijies2025.1130.60
Campus : Coimbatore
School : School of Physical Sciences
Department : Mathematics
Year : 2025
Abstract : The increasing reliance on speech-based systems has underscored the challenge of accurately classifying a speaker’s gender and age from high-dimensional speech signals, where irrelevant or redundant features often degrade performance and reduce classification accuracy. This study aims to address this problem by developing a robust Speaker Gender and Age Classification (SGAC) system that enhances accuracy through innovative feature management and hierarchical classification. The proposed solution introduces a novel two-level SGAC framework that begins with preprocessing speech signals using the Slepian windowing approach to extract and fuse type-1 features with Perceptual Linear Prediction (IPPLP) coefficients and type-2 features with Enhanced Bottleneck Features (EBNF) derived via Convolutional Neural Network-Deep Neural Network (CNN-DNN) techniques, ensuring optimal feature extraction. A pioneering integration of Weighted Pairwise Principal Component Analysis and Linear Discriminant Analysis (WPPCA-LDA) eliminates irrelevant features, reduces dimensionality, and maximizes class separability, thereby representing a vital advancement over traditional methods. At the first level, classifiers such as the ensemble Gaussian Mixture Model-Support Vector Machine (GMM-SVM) and the modified Recurrent Neural Network (mRNN) are introduced to identify the speaker’s gender using selected features. The abilities of classical RNNs are modified according to the Neural Turing Machines (NTM), which simplifies the solutions of classification tasks via merging neural networks with external memory resources. The second level classifies age within the identified gender, with score-level fusion of classifier outputs optimizing the final categorization. Finally, this proposed framework is trained and validated using the male and female speech samples acquired from the TIMIT and Switchboard datasets, children's speech samples from CMU Kids Corpora, and bi-gender speech samples from a real-time dataset. The experimental results demonstrate that this two-level SGAC system using WPPCA-LDA-GMM-SVM-mRNN achieves 94.16% accuracy, 93.54% precision, 94.14% recall, and 93.84% F1-score for classifying the gender and age of the speakers compared to the baseline models, such as GMM, SVM, and GMM-SVM.
Cite this Research Publication : , Hierarchical Age and Gender Classification from Speech Using Deep Feature Fusion and Enhanced Dimensionality Projection, International Journal of Intelligent Engineering and Systems, The Intelligent Networks and Systems Society, 2025, https://doi.org/10.22266/ijies2025.1130.60