Programs
- M. Tech. in Automotive Engineering -Postgraduate
- Online Certificate Course on Antimicrobial Stewardship and Infection Prevention and Control -Certificate
Publication Type : Conference Proceedings
Publisher : Elsevier BV
Source : Procedia Computer Science
Url : https://doi.org/10.1016/j.procs.2025.04.353
Keywords : Inception-v3, Mobilenet, PCA, Random Forest, Resnet, SVM, VGG-16
Campus : Bengaluru
School : School of Engineering
Department : Electronics and Communication
Year : 2025
Abstract : This paper presents a lip reading strategy for visual speech recognition applications by uniquely combining the feature learning capabilities of different deep model architectures. Training models like Resnet, Inception-V3, VGG-16 and Mobilenet architectures are utilised as lip feature extractors. The extracted features are combined and utilised for word prediction on the MIRACL-VC1 dataset using various classifiers. Based on different combinations of deep feature extractors and learning models, eleven model combinations are tested on the dataset. Among these, the best-performing model used a combination of Resnet feature extractor followed by dimensionality reduction using PCA, combined with random forest classifier. The performance evaluation on the best model yielded performance metrics of 75% accuracy, 74% precision, 75% recall, and 74% F-score. Additionally, it is found that the model’s performance is superior to the SOTA models, thereby demonstrating its potential for real-world applications in noisy environments, security, and human-computer interaction.
Cite this Research Publication : Susmitha Vekkot, Taduvai Satvik Gupta, Konduru Praveen Karthik, Doradla Kaushik, Enhanced Lip Reading Using Deep Model Feature Fusion: A Study on the MIRACL-VC1 Dataset, Procedia Computer Science, Elsevier BV, 2025, https://doi.org/10.1016/j.procs.2025.04.353