Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models (2311.12128v1)

Published 20 Nov 2023 in cs.CV and cs.HC

Abstract: We address the task of American Sign Language fingerspelling translation using videos in the wild. We exploit advances in more accurate hand pose estimation and propose a novel architecture that leverages the transformer based encoder-decoder model enabling seamless contextual word translation. The translation model is augmented by a novel loss term that accurately predicts the length of the finger-spelled word, benefiting both training and inference. We also propose a novel two-stage inference approach that re-ranks the hypotheses using the LLM capabilities of the decoder. Through extensive experiments, we demonstrate that our proposed method outperforms the state-of-the-art models on ChicagoFSWild and ChicagoFSWild+ achieving more than 10% relative improvement in performance. Our findings highlight the effectiveness of our approach and its potential to advance fingerspelling recognition in sign language translation. Code is also available at https://github.com/pooyafayyaz/Fingerspelling-PoseNet.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (3)

Pooya Fayyazsanavi (6 papers)
Negar Nejatishahidin (5 papers)
Jana Kosecka (43 papers)

GitHub

GitHub - pooyafayyaz/Fingerspelling-PoseNet (4 stars)

Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models (2311.12128v1)

Related Papers

GitHub