Back close

Enhanced Lip Reading Using Deep Model Feature Fusion: A Study on the MIRACL-VC1 Dataset

Publication Type : Conference Proceedings

Publisher : Elsevier BV

Source : Procedia Computer Science

Url : https://doi.org/10.1016/j.procs.2025.04.353

Keywords : Inception-v3, Mobilenet, PCA, Random Forest, Resnet, SVM, VGG-16

Campus : Bengaluru

School : School of Engineering

Department : Electronics and Communication

Year : 2025

Abstract : This paper presents a lip reading strategy for visual speech recognition applications by uniquely combining the feature learning capabilities of different deep model architectures. Training models like Resnet, Inception-V3, VGG-16 and Mobilenet architectures are utilised as lip feature extractors. The extracted features are combined and utilised for word prediction on the MIRACL-VC1 dataset using various classifiers. Based on different combinations of deep feature extractors and learning models, eleven model combinations are tested on the dataset. Among these, the best-performing model used a combination of Resnet feature extractor followed by dimensionality reduction using PCA, combined with random forest classifier. The performance evaluation on the best model yielded performance metrics of 75% accuracy, 74% precision, 75% recall, and 74% F-score. Additionally, it is found that the model’s performance is superior to the SOTA models, thereby demonstrating its potential for real-world applications in noisy environments, security, and human-computer interaction.

Cite this Research Publication : Susmitha Vekkot, Taduvai Satvik Gupta, Konduru Praveen Karthik, Doradla Kaushik, Enhanced Lip Reading Using Deep Model Feature Fusion: A Study on the MIRACL-VC1 Dataset, Procedia Computer Science, Elsevier BV, 2025, https://doi.org/10.1016/j.procs.2025.04.353

Admissions Apply Now