Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SignDiff: Diffusion Models for American Sign Language Production (2308.16082v2)

Published 30 Aug 2023 in cs.CV

Abstract: In this paper, we propose a dual-condition diffusion pre-training model named SignDiff that can generate human sign language speakers from a skeleton pose. SignDiff has a novel Frame Reinforcement Network called FR-Net, similar to dense human pose estimation work, which enhances the correspondence between text lexical symbols and sign language dense pose frames, reduces the occurrence of multiple fingers in the diffusion model. In addition, we propose a new method for American Sign Language Production (ASLP), which can generate ASL skeletal pose videos from text input, integrating two new improved modules and a new loss function to improve the accuracy and quality of sign language skeletal posture and enhance the ability of the model to train on large-scale data. We propose the first baseline for ASL production and report the scores of 17.19 and 12.85 on BLEU-4 on the How2Sign dev/test sets. We evaluated our model on the previous mainstream dataset PHOENIX14T, and our method achieved the SOTA results. In addition, our image quality far exceeds all previous results by 10 percentage points in terms of SSIM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Sen Fang (15 papers)
  2. Chunyu Sui (5 papers)
  3. Xuedong Zhang (12 papers)
  4. Yapeng Tian (80 papers)
  5. Yanghao Zhou (4 papers)
  6. Hongbin Zhong (2 papers)
  7. Minyu Zhao (1 paper)
  8. Chen Chen (752 papers)
Citations (7)