Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus (2306.15162v2)

Published 27 Jun 2023 in cs.CL and cs.CV

Abstract: Machine learning for sign languages is bottlenecked by data. In this paper, we present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions drawn from YouTube. With ~1000 hours of videos and >2500 unique signers, YouTube-ASL is ~3x as large and has ~10x as many unique signers as the largest prior ASL dataset. We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign, where we achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. David Uthus (11 papers)
  2. Garrett Tanzer (11 papers)
  3. Manfred Georg (3 papers)
Citations (29)

Summary

We haven't generated a summary for this paper yet.