Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using an LLM to Turn Sign Spottings into Spoken Language Sentences (2403.10434v2)

Published 15 Mar 2024 in cs.CV

Abstract: Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a hybrid SLT approach, Spotter+GPT, that utilizes a sign spotter and a powerful LLM to improve SLT performance. Spotter+GPT breaks down the SLT task into two stages. The videos are first processed by the Spotter, which is trained on a linguistic sign language dataset, to identify individual signs. These spotted signs are then passed to an LLM, which transforms them into coherent and contextually appropriate spoken language sentences. The source code of the Spotter is available at https://gitlab.surrey.ac.uk/cogvispublic/sign-spotter.

Overview of "Using an LLM to Turn Signs into Speech"

The paper focuses on the methodology of using a LLM, specifically ChatGPT, to convert sign language inputs into spoken language. The authors detail the prompt engineering process they undertook to optimize the performance of ChatGPT in generating coherent sentences from a list of words. This involves developing an initial strategy and refining it through empirical observations.

Prompt Engineering Methodology

The paper begins with a basic prompt setup, aiming at generating sentences from a provided list of words. This rudimentary approach, while straightforward, exposed limitations of the LLM. Specifically, the model occasionally produced unrelated outputs, particularly in cases where sign recognition (gloss) was incomplete or nonexistent. To address this issue, the authors integrated additional rules into the prompt framework. These rules ensure that when translation is infeasible—due to the absence of detected signs or insufficient gloss data—the model responds with "No Translation" instead of unrelated content.

Implications

This research delineates a significant stride in improving LLM interaction through precise prompt engineering. By tailoring the prompt, the authors enhance the model's capacity to handle incomplete or ambiguous inputs. The implications extend to various applications in NLP, particularly in improving the robustness of LLMs in human-computer interaction scenarios. For example, this approach may benefit automated translation systems or assistive technologies for the hearing impaired.

Prospective Developments

The paper underlines the potential for further refinement of LLM applications through active prompt management. Future research could explore more sophisticated prompt-generation techniques, perhaps involving dynamic adaptation based on real-time feedback from LLMs. Additionally, there is room to extend this methodology to other languages and dialects, which broadens the scope of application domains.

In conclusion, while the paper provides a concentrated look at a niche application of LLMs, it prompts broader considerations for the alignment and control of these models to meet specific use-case requirements. This foundation could lead to more resilient and versatile AI systems, capable of seamless integration into diverse communicative contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Bsl-1k: Scaling up co-articulated sign language recognition using mouthing cues. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 35–53. Springer, 2020.
  2. Bbc-oxford british sign language dataset. arXiv preprint arXiv:2111.03635, 2021.
  3. M. Bohacek and M. Hrúz. Learning from what is already out there: Few-shot sign language recognition with online dictionaries. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–6. IEEE, 2023.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Neural sign language translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  6. Sign language transformers: Joint end-to-end sign language recognition and translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10023–10033, 2020.
  7. J. Carreira and A. Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
  8. A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions on Multimedia, 21(7):1880–1891, 2019.
  9. How2sign: a large-scale multimodal dataset for continuous american sign language. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2735–2744, 2021.
  10. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, pages 369–376, 2006.
  11. Skeleton aware multi-modal sign language recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3413–3423, 2021.
  12. H. R. V. Joze and O. Koller. Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053, 2018.
  13. Meine dgs–annotiert.öffentliches korpus der deutschen gebärdensprache, 3. release / my dgs – annotated. public corpus of german sign language, 3rd release. 2020.
  14. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1459–1469, 2020.
  15. Linguistically motivated evaluation of the 2023 state-of-the-art machine translation: Can chatgpt outperform nmt? In Proceedings of the Eighth Conference on Machine Translation, pages 224–245, 2023.
  16. Findings of the first wmt shared task on sign language translation (wmt-slt22). In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 744–772, 2022.
  17. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  18. Towards making the most of chatgpt for machine translation. Available at SSRN 4390455, 2023.
  19. M. Post. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771, 2018.
  20. Iterative alignment network for continuous sign language recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4165–4174, 2019.
  21. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476, Accepted by Empirical Methods in Natural Language Processing (EMNLP) 2023., 2023.
  22. Improving language understanding by generative pre-training. 2018.
  23. Signing at scale: Learning to co-articulate signs for large-scale photo-realistic sign language production. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5141–5151, 2022.
  24. Bleurt: Learning robust metrics for text generation. arXiv preprint arXiv:2004.04696, 2020.
  25. N. Shahin and L. Ismail. Chatgpt, let us chat sign language: Experiments, architectural elements, challenges and research directions. In 2023 International Symposium on Networks, Computers and Communications (ISNCC), pages 1–7. IEEE, 2023.
  26. Is context all you need? scaling neural sign language translation to large domains of discourse. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1955–1965, 2023.
  27. Autsl: A large scale multi-modal turkish sign language dataset and baseline methods. IEEE Access, 8:181340–181355, 2020.
  28. On the importance of initialization and momentum in deep learning. In International conference on machine learning, pages 1139–1147. PMLR, 2013.
  29. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  30. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  31. Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3462–3471, 2021.
  32. F. Wei and Y. Chen. Improving continuous sign language recognition with cross-lingual signs. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23612–23621, 2023.
  33. Hierarchical i3d for sign spotting. In European Conference on Computer Vision, pages 243–255. Springer, 2022.
  34. Sign language translation with iterative prototype. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15592–15601, 2023.
  35. Sltunet: A simple unified model for sign language translation. In The Eleventh International Conference on Learning Representations (ICLR), 2022.
  36. Cvt-slr: Contrastive visual-textual transformation for sign language recognition with variational alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23141–23150, 2023.
  37. Improving sign language translation with monolingual data by sign back-translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1316–1325, June 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ozge Mercanoglu Sincan (9 papers)
  2. Necati Cihan Camgoz (31 papers)
  3. Richard Bowden (80 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com