Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Data-Driven Representation for Sign Language Production (2404.11499v1)

Published 17 Apr 2024 in cs.CL and cs.AI

Abstract: Phonetic representations are used when recording spoken languages, but no equivalent exists for recording signed languages. As a result, linguists have proposed several annotation systems that operate on the gloss or sub-unit level; however, these resources are notably irregular and scarce. Sign Language Production (SLP) aims to automatically translate spoken language sentences into continuous sequences of sign language. However, current state-of-the-art approaches rely on scarce linguistic resources to work. This has limited progress in the field. This paper introduces an innovative solution by transforming the continuous pose generation problem into a discrete sequence generation problem. Thus, overcoming the need for costly annotation. Although, if available, we leverage the additional information to enhance our approach. By applying Vector Quantisation (VQ) to sign language data, we first learn a codebook of short motions that can be combined to create a natural sequence of sign. Where each token in the codebook can be thought of as the lexicon of our representation. Then using a transformer we perform a translation from spoken language text to a sequence of codebook tokens. Each token can be directly mapped to a sequence of poses allowing the translation to be performed by a single network. Furthermore, we present a sign stitching method to effectively join tokens together. We evaluate on the RWTH-PHOENIX-Weather-2014T (PHOENIX14T) and the more challenging Meine DGS Annotated (mDGS) datasets. An extensive evaluation shows our approach outperforms previous methods, increasing the BLEU-1 back translation score by up to 72%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
  2. Y. Bouzid and M. Jemni. An avatar based approach for automatically interpreting a sign language notation. In 2013 IEEE 13th International Conference on Advanced Learning Technologies, pages 92–94. IEEE, 2013.
  3. Sign language recognition, generation, and translation: An interdisciplinary perspective. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility, pages 16–31, 2019.
  4. Neural sign language translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  5. Sign language transformers: Joint end-to-end sign language recognition and translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10023–10033, 2020.
  6. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341, 2020.
  7. The challenge of realistic music generation: modelling raw audio at scale. Advances in neural information processing systems, 31, 2018.
  8. P. Dierckx. Algorithms for smoothing data with periodic and parametric splines. Computer Graphics and Image Processing, 20(2):171–184, 1982.
  9. A. Gersho and R. M. Gray. Vector quantization and signal compression, volume 159. Springer Science & Business Media, 2012.
  10. Vanessa–a system for communication between deaf and hearing people. Technology and disability, 18(4):207–216, 2006.
  11. X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010.
  12. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, pages 369–376, 2006.
  13. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10696–10706, 2022.
  14. T. Hanke. Hamnosys-representing sign language data in language resources and language processing contexts. In LREC, volume 4, pages 1–6, 2004.
  15. Towards fast and high-quality sign language production. In Proceedings of the 29th ACM International Conference on Multimedia, pages 3172–3181, 2021.
  16. Autoregressive sign language production: A gloss-free approach with discrete representations. arXiv preprint arXiv:2309.12179, 2023.
  17. Improving 3d pose estimation for sign language. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pages 1–5, 2023.
  18. Machine translation between spoken languages and signed languages represented in signwriting. arXiv preprint arXiv:2210.05404, 2022.
  19. Fast decoding in sequence models using discrete latent variables. In International Conference on Machine Learning, pages 2390–2399. PMLR, 2018.
  20. Educational resources and implementation of a greek sign language synthesis architecture. Computers & Education, 49(1):54–74, 2007.
  21. K. Kaur and P. Kumar. Hamnosys to sigml conversion system for sign language automation. Procedia Computer Science, 89:794–803, 2016.
  22. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  23. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  24. D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  25. P. Koehn and R. Knowles. Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872, 2017.
  26. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding, 141:108–125, 2015. Pose & Gesture.
  27. Meine dgs – annotiert. öffentliches korpus der deutschen gebärdensprache, 3. release / my dgs – annotated. public corpus of german sign language, 3rd release, 2020.
  28. Joey nmt: A minimalist nmt toolkit for novices. arXiv:1907.12484, 2019.
  29. J. Laver. Linguistic phonetics. The handbook of linguistics, pages 150–179, 2001.
  30. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
  31. C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
  32. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
  33. An automated technique for real-time production of lifelike animations of american sign language. Universal Access in the Information Society, 15:551–566, 2016.
  34. Neural discrete representation learning. arXiv preprint arXiv:1711.00937, 2017.
  35. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  36. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019.
  37. Adversarial training for multi-channel sign language production. arXiv preprint arXiv:2008.12405, 2020.
  38. Progressive transformers for end-to-end sign language production. In European Conference on Computer Vision, 2020.
  39. Mixed signals: Sign language production via a mixture of motion primitives. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1919–1929, 2021.
  40. Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  41. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014.
  42. W. C. Stokoe. Sign language structure. Annual Review of Anthropology, 1980.
  43. Text2sign: towards sign language production using neural machine translation and generative adversarial networks. International Journal of Computer Vision, 128(4):891–908, 2020.
  44. V. Sutton. Lessons in SignWriting. SignWriting Press, 2022.
  45. R. Sutton-Spence and B. Woll. The linguistics of British Sign Language: an introduction. Cambridge University Press, 1999.
  46. S. Tamura and S. Kawasaki. Recognition of sign language motion images. Pattern Recognition, 1988.
  47. M. H. Vali and T. Bäckström. Nsvq: Noise substitution in vector quantization for machine learning. IEEE Access, 10:13598–13610, 2022.
  48. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  49. Changing the representation: Examining language representation for neural sign language production. arXiv preprint arXiv:2210.06312, 2022.
  50. Vector quantized diffusion model with codeunet for text-to-sign pose sequences generation. arXiv preprint arXiv:2208.09141, 2022.
  51. Soundstream: An end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.