Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Addressing the Blind Spots in Spoken Language Processing (2309.06572v1)

Published 6 Sep 2023 in eess.AS, cs.CL, and cs.SD

Abstract: This paper explores the critical but often overlooked role of non-verbal cues, including co-speech gestures and facial expressions, in human communication and their implications for NLP. We argue that understanding human communication requires a more holistic approach that goes beyond textual or spoken words to include non-verbal elements. Borrowing from advances in sign language processing, we propose the development of universal automatic gesture segmentation and transcription models to transcribe these non-verbal cues into textual form. Such a methodology aims to bridge the blind spots in spoken language understanding, enhancing the scope and applicability of NLP models. Through motivating examples, we demonstrate the limitations of relying solely on text-based models. We propose a computationally efficient and flexible approach for incorporating non-verbal cues, which can seamlessly integrate with existing NLP pipelines. We conclude by calling upon the research community to contribute to the development of universal transcription methods and to validate their effectiveness in capturing the complexities of real-world, multi-modal interactions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Amit Moryossef (25 papers)

Summary

We haven't generated a summary for this paper yet.