Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Autoregressive Sign Language Production: A Gloss-Free Approach with Discrete Representations (2309.12179v2)

Published 21 Sep 2023 in cs.CV

Abstract: Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. This paper presents the Sign language Vector Quantization Network, a novel approach to SLP that leverages Vector Quantization to derive discrete representations from sign pose sequences. Our method, rooted in both manual and non-manual elements of signing, supports advanced decoding methods and integrates latent-level alignment for enhanced linguistic coherence. Through comprehensive evaluations, we demonstrate superior performance of our method over prior SLP methods and highlight the reliability of Back-Translation and Fr\'echet Gesture Distance as evaluation metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. “Considerations for meaningful sign language machine translation based on glosses,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 682–693.
  2. “Gloss-free end-to-end sign language translation,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 12904–12916.
  3. “Sign language video retrieval with free-form textual queries,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14094–14104.
  4. “Signing at scale: Learning to co-articulate signs for large-scale photo-realistic sign language production,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5141–5151.
  5. “Cico: Domain-aware sign language retrieval via cross-lingual contrastive learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19016–19026.
  6. “Progressive transformers for end-to-end sign language production,” in Proceedings of the 16th European Conference on Computer Vision. Springer, 2020, pp. 687–705.
  7. “Mixed signals: Sign language production via a mixture of motion primitives,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1919–1929.
  8. “Non-autoregressive sign language production with gaussian space,” in Proceedings of the 32nd British Machine Vision Conference, 2021, pp. 22–25.
  9. “Towards fast and high-quality sign language production,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3172–3181.
  10. “Adversarial training for multi-channel sign language production,” in Proceedings of the 31st British Machine Vision Conference, 2020, pp. 7–10.
  11. “The concrete distribution: A continuous relaxation of discrete random variables,” arXiv preprint arXiv:1611.00712, 2016.
  12. “Categorical reparameterization with gumbel-softmax,” arXiv preprint arXiv:1611.01144, 2016.
  13. “Neural discrete representation learning,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  14. “Zero-shot text-to-image generation,” in International Conference on Machine Learning, 2021, pp. 8821–8831.
  15. “Convolutional sequence generation for skeleton-based action synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4394–4402.
  16. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in Neural Information Processing Systems, vol. 33, pp. 12449–12460, 2020.
  17. “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
  18. “Neural sign language translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7784–7793.
  19. “How2sign: A large-scale multimodal dataset for continuous american sign language,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2735–2744.
  20. “Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity,” ACM Transactions on Graphics (TOG), vol. 39, no. 6, pp. 1–16, 2020.
  21. “Ham2pose: Animating sign language notation into pose sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21046–21056.
Citations (4)

Summary

We haven't generated a summary for this paper yet.