Publication Type : Conference Paper
Publisher : Springer Nature Singapore
Source : Lecture Notes in Networks and Systems
Url : https://doi.org/10.1007/978-981-96-1267-3_14
Campus : Coimbatore
School : School of Computing
Department : Computer Science and Engineering
Year : 2025
Abstract :
The task of Continuous Sign Language Recognition in Computer Vision is challenging due to the multi-articulate nature of the sign language gestures. This study proposes an encoder-decoder architecture with CNN as the encoder and RNN as the decoder. The relevant features from the frames of a gesture video are captured by the CNN (ResNet50) encoder portion of the pipeline, and the temporal features are captured by the RNN (LSTM) decoder for efficient Sign Language Recognition. The proposed model was compared against a standalone CNN approach where the input frames are classified into signs considering only the spatial features. As expected, the hybrid encoder-decoder model outperformed the CNN model substantiating the importance of including a component to capture the temporal features. Both models were tested on the Argentinian Sign Language dataset (LSA64) comparing the overall and classwise accuracy of the 64 signs present in the dataset. Additionally, the pipeline was trained with another color-glove-based Argentinian Sign Language dataset (LSA16), containing 16 sign classes, to test its generalizability, and it was effectively able to generalize with the LSA16 dataset as well.
Cite this Research Publication : M. Sneha Varsha, V. Shankara Narayanan, S. Padmavathi, Continuous Sign Language Recognition Using Encoder-Decoder Architecture, Lecture Notes in Networks and Systems, Springer Nature Singapore, 2025, https://doi.org/10.1007/978-981-96-1267-3_14