Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMs are Good Sign Language Translators (2404.00925v1)

Published 1 Apr 2024 in cs.CV and cs.CL

Abstract: Sign Language Translation (SLT) is a challenging task that aims to translate sign videos into spoken language. Inspired by the strong translation capabilities of LLMs that are trained on extensive multilingual text corpora, we aim to harness off-the-shelf LLMs to handle SLT. In this paper, we regularize the sign videos to embody linguistic characteristics of spoken language, and propose a novel SignLLM framework to transform sign videos into a language-like representation for improved readability by off-the-shelf LLMs. SignLLM comprises two key modules: (1) The Vector-Quantized Visual Sign module converts sign videos into a sequence of discrete character-level sign tokens, and (2) the Codebook Reconstruction and Alignment module converts these character-level tokens into word-level sign representations using an optimal transport formulation. A sign-text alignment loss further bridges the gap between sign and text tokens, enhancing semantic compatibility. We achieve state-of-the-art gloss-free results on two widely-used SLT benchmarks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jia Gong (8 papers)
  2. Lin Geng Foo (19 papers)
  3. Yixuan He (20 papers)
  4. Hossein Rahmani (44 papers)
  5. Jun Liu (606 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com