Autoregressive Sign Language Production: A Gloss-Free Approach with Discrete Representations (2309.12179v2)
Abstract: Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. This paper presents the Sign language Vector Quantization Network, a novel approach to SLP that leverages Vector Quantization to derive discrete representations from sign pose sequences. Our method, rooted in both manual and non-manual elements of signing, supports advanced decoding methods and integrates latent-level alignment for enhanced linguistic coherence. Through comprehensive evaluations, we demonstrate superior performance of our method over prior SLP methods and highlight the reliability of Back-Translation and Fr\'echet Gesture Distance as evaluation metrics.
- “Considerations for meaningful sign language machine translation based on glosses,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 682–693.
- “Gloss-free end-to-end sign language translation,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 12904–12916.
- “Sign language video retrieval with free-form textual queries,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14094–14104.
- “Signing at scale: Learning to co-articulate signs for large-scale photo-realistic sign language production,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5141–5151.
- “Cico: Domain-aware sign language retrieval via cross-lingual contrastive learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19016–19026.
- “Progressive transformers for end-to-end sign language production,” in Proceedings of the 16th European Conference on Computer Vision. Springer, 2020, pp. 687–705.
- “Mixed signals: Sign language production via a mixture of motion primitives,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1919–1929.
- “Non-autoregressive sign language production with gaussian space,” in Proceedings of the 32nd British Machine Vision Conference, 2021, pp. 22–25.
- “Towards fast and high-quality sign language production,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3172–3181.
- “Adversarial training for multi-channel sign language production,” in Proceedings of the 31st British Machine Vision Conference, 2020, pp. 7–10.
- “The concrete distribution: A continuous relaxation of discrete random variables,” arXiv preprint arXiv:1611.00712, 2016.
- “Categorical reparameterization with gumbel-softmax,” arXiv preprint arXiv:1611.01144, 2016.
- “Neural discrete representation learning,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- “Zero-shot text-to-image generation,” in International Conference on Machine Learning, 2021, pp. 8821–8831.
- “Convolutional sequence generation for skeleton-based action synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4394–4402.
- “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in Neural Information Processing Systems, vol. 33, pp. 12449–12460, 2020.
- “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
- “Neural sign language translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7784–7793.
- “How2sign: A large-scale multimodal dataset for continuous american sign language,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2735–2744.
- “Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity,” ACM Transactions on Graphics (TOG), vol. 39, no. 6, pp. 1–16, 2020.
- “Ham2pose: Animating sign language notation into pose sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21046–21056.