Incorporating Relative Position Information in Transformer-Based Sign Language Recognition and Translation

Publication Type : Journal Article

Publisher : IEEE

Source : IEEE Access

Url : https://ieeexplore.ieee.org/abstract/document/9585521

Campus : Amritapuri

School : School of Computing, School of Engineering

Center : AI and Disability Studies, Computer Vision and Robotics

Department : Computer Science

Verified : Yes

Year : 2021

Abstract : Recent advancements in machine translation tasks, with the advent of attention mechanisms and Transformer networks, have accelerated the research in Sign Language Translation (SLT), a spatio-temporal vision translation task. Fundamentally, Transformers are unaware of the sequential orderings in input, and therefore position-information should be explicitly fed into them. The sequence learning capability of Transformers is heavily dependent on this ordering information. Compared to the existing Transformer models for SLT that use baseline version with sinusoidal position embedding, this work focuses on incorporating a new positioning scheme into the Transformer networks, in the context of SLT. This is the first work in SLT that explores the positioning scheme of Transformers for optimizing translation scores. The study proposes Gated Recurrent Unit (GRU)-Relative Sign Transformer (RST) for jointly learning Continuous Sign Language Recognition (CSLR) and translation. This model significantly improves the video translation quality. In this approach, GRU acts as the relative position encoder and RST is the Transformer model with relative position incorporated in the Multi-Head Attention (MHA). The evaluation was done on the RWTH-PHOENIX-2014T benchmark dataset. This study reports state-of-the-art Bilingual Evaluation Understudy (BLEU-4) score of 22.4 and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score of 48.55 for SLT, with GRU-RST. The best Word Error Rate (WER) obtained with this approach is 23.5. A detailed study of the position encoding schemes of Transformers is presented. Further, we analyze the translation performance under various combinations of the positioning schemes.

Cite this Research Publication : N Aloysius, M Geetha, P Nedungadi, Incorporating Relative Position Information in Transformer-Based Sign Language Recognition and Translation, IEEE Access, 2021

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar
(Proposed Campus)

Research

Centers

Patents

Publication