Large language models in textual analysis for gesture selection (2310.13705v1)
Abstract: Gestures perform a variety of communicative functions that powerfully influence human face-to-face interaction. How this communicative function is achieved varies greatly between individuals and depends on the role of the speaker and the context of the interaction. Approaches to automatic gesture generation vary not only in the degree to which they rely on data-driven techniques but also the degree to which they can produce context and speaker specific gestures. However, these approaches face two major challenges: The first is obtaining sufficient training data that is appropriate for the context and the goal of the application. The second is related to designer control to realize their specific intent for the application. Here, we approach these challenges by using LLMs to show that these powerful models of large amounts of data can be adapted for gesture analysis and generation. Specifically, we used ChatGPT as a tool for suggesting context-specific gestures that can realize designer intent based on minimal prompts. We also find that ChatGPT can suggests novel yet appropriate gestures not present in the minimal training data. The use of LLMs is a promising avenue for gesture generation that reduce the need for laborious annotations and has the potential to flexibly and quickly adapt to different designer intents.
- No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1884–1895. https://doi.org/10.18653/v1/2020.findings-emnlp.170
- Low-Resource Adaptation for Personalized Co-Speech Gesture Generation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 20534–20544. https://doi.org/10.1109/CVPR52688.2022.01991
- Open AI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
- A broad-coverage deep semantic lexicon for verbs. arXiv preprint arXiv:2007.02670 (2020).
- Anthropic. 2023. Claude. https://www.anthropic.com/product
- Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings. ACM Transactions on Graphics 41, 6 (Nov. 2022), 209:1–209:19. https://doi.org/10.1145/3550454.3555435
- Janet Beavin Bavelas. 1994. Gestures as part of speech: Methodological implications. Research on language and social interaction 27, 3 (1994), 201–221.
- Kirsten Bergmann and Stefan Kopp. 2009. Increasing the expressiveness of virtual agents: autonomous generation of speech and gesture for spatial description tasks. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 361–368.
- Marcel Binz and Eric Schulz. 2023. Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences 120, 6 (2023), e2218523120.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
- Geneviève Calbris. 2011. Elements of meaning in gesture. Vol. 5. John Benjamins Publishing.
- Daniel Casasanto and Kyle Jasmin. 2010. Good and Bad in the Hands of Politicians: Spontaneous Gestures during Positive and Negative Speech. PLOS ONE 5, 7 (July 2010), e11805. https://doi.org/10.1371/journal.pone.0011805 Publisher: Public Library of Science.
- Coordination and context-dependence in the generation of embodied conversation. In Proceedings of the first international conference on Natural language generation - Volume 14 (INLG ’00). Association for Computational Linguistics, USA, 171–178. https://doi.org/10.3115/1118253.1118277
- BEAT: The behavior expression animation toolkit. In Proceedings of ACM SIGGRAPH. 477–486.
- Beat: the behavior expression animation toolkit. In Life-Like Characters. Springer, 163–185.
- Individual differences in frequency and saliency of speech-accompanying gestures: The role of cognitive abilities and empathy. Journal of Experimental Psychology: General 143, 2 (2014), 694.
- Sharice Clough and Melissa C. Duff. 2020. The Role of Gesture in Communication and Cognition: Implications for Understanding and Treating Neurogenic Communication Disorders. Frontiers in Human Neuroscience 14 (2020). https://www.frontiersin.org/articles/10.3389/fnhum.2020.00323
- SimSensei Kiosk: A Virtual Human Interviewer for Healthcare Decision Support. (2014), 1061–1068.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
- Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).
- Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding. https://hal.science/hal-03972415
- Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation. http://arxiv.org/abs/2110.04527 arXiv:2110.04527 [eess].
- Gretchen N. Foley and Julie P. Gentile. 2010. Nonverbal Communication in Psychotherapy. Psychiatry (Edgmont) 7, 6 (June 2010), 38–44. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2898840/
- GesGPT: Speech Gesture Synthesis With Text Parsing from GPT. http://arxiv.org/abs/2303.13013 arXiv:2303.13013 [cs].
- ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech. Computer Graphics Forum 42, 1 (2023), 206–216. https://doi.org/10.1111/cgf.14734 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14734.
- Susan Goldin-Meadow and Martha Wagner Alibali. 2013. Gesture’s role in speaking, learning, and creating language. Annual review of psychology 64 (2013), 257–283.
- Google. 2023. Claude. https://bard.google.com
- Joseph Grady. 1997. Foundations of meaning: Primary metaphors and primary scenes. (1997).
- Prosody in the hands of the speaker. Frontiers in Psychology 5 (2014). https://www.frontiersin.org/articles/10.3389/fpsyg.2014.00700
- Nonverbal behavior in clinician—patient interaction. Applied and Preventive Psychology 4, 1 (1995), 21–37. https://doi.org/10.1016/S0962-1849(05)80049-6
- The hands of Donald Trump: Entertainment, gesture, spectacle. HAU: Journal of Ethnographic Theory 6, 2 (Sept. 2016), 71–100. https://doi.org/10.14318/hau6.2.009 Publisher: The University of Chicago Press.
- Autumn B. Hostetter. 2011. When do gestures communicate? A meta-analysis. Psychological Bulletin 137, 2 (2011), 297. https://doi.org/10.1037/a0022128 Publisher: US: American Psychological Association.
- A Speech-Driven Hand Gesture Generation Method and Evaluation in Android Robots. IEEE Robotics and Automation Letters 3, 4 (Oct. 2018), 3757–3764. https://doi.org/10.1109/LRA.2018.2856281 Conference Name: IEEE Robotics and Automation Letters.
- Azadeh Jamalian and Barbara Tversky. 2012. Gestures alter thinking about time. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 34. 503–508.
- Codra: A novel discriminative framework for rhetorical analysis. Computational Linguistics 41, 3 (2015), 385–435.
- Adam Kendon. 1997. Gesture. Annual review of anthropology 26, 1 (1997), 109–128.
- Adam Kendon. 2004. Gesture: Visible action as utterance. Cambridge University Press.
- Adam Kendon et al. 1980. Gesticulation and speech: Two aspects of the process of utterance. The relationship of verbal and nonverbal communication 25, 1980 (1980), 207–227.
- Michael Kipp. 2003. Gesture generation by imitation : from human behavior to computer character animation. Universität des Saarlandes. https://doi.org/10.22028/D291-25852
- Michael Kipp and Jean-Claude Martin. 2009. Gesture and emotion: Can basic gestural form features discriminate emotions?. In 2009 3rd international conference on affective computing and intelligent interaction and workshops. IEEE, 1–8.
- Michal Kosinski. 2023. Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083 (2023).
- Gesticulator: A framework for semantically-aware speech-driven gesture generation. In Proceedings of the 2020 International Conference on Multimodal Interaction. ACM, Virtual Event Netherlands, 242–250. https://doi.org/10.1145/3382507.3418815
- Jina Lee and Stacy Marsella. 2006. Nonverbal behavior generator for embodied conversational agents. In International Conference on Intelligent Virtual Agents. Springer, 243–255.
- Margot Lhommet and Stacy Marsella. 2014. Metaphoric gestures: towards grounded mental spaces. In Intelligent Virtual Agents: 14th International Conference, IVA 2014, Boston, MA, USA, August 27-29, 2014. Proceedings 14. Springer, 264–274.
- Virtual Character Performance from Speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Anaheim, California) (SCA ’13). ACM, New York, NY, USA, 25–35. https://doi.org/10.1145/2485895.2485900
- David McNeill. 1985. So you think gestures are nonverbal? Psychological review 92, 3 (1985), 350.
- David McNeill. 1992. Hand and mind: What gestures reveal about thought. University of Chicago press.
- David McNeill. 2005. Gesture, gaze, and ground. In International workshop on machine learning for multimodal interaction. Springer, 1–14.
- George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39–41.
- Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT. Methods of Information in Medicine 60, S 01 (June 2021), e56–e64. https://doi.org/10.1055/s-0041-1731390
- Michael Neff. 2016. Hand Gesture Synthesis for Conversational Characters. https://doi.org/10.1007/978-3-319-30808-1_5-1
- Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics (TOG) 27, 1 (2008), 1–24.
- A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. https://doi.org/10.1111/cgf.14776 arXiv:2301.05339 [cs].
- Terry H. Ostermeier. 1997. Gender, Nonverbal Cues, and Intercultural Listening: Conversational Space and Hand Gestures. Technical Report. https://eric.ed.gov/?id=ED416520 ERIC Number: ED416520.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Demet Özer and Tilbe Göksun. 2020. Gesture use and processing: A review on individual differences in cognitive resources. Frontiers in Psychology 11 (2020), 573555.
- tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7047–7055. https://doi.org/10.18653/v1/2020.acl-main.630
- Instruction Tuning with GPT-4. arXiv preprint arXiv:2304.03277 (2023).
- Automating the production of communicative gestures in embodied characters. Frontiers in psychology 9 (2018).
- To err is human (-like): Effects of robot gesture on perceived anthropomorphism and likability. International Journal of Social Robotics 5, 3 (2013), 313–323.
- Generation and evaluation of communicative robot gesture. International Journal of Social Robotics 4, 2 (2012), 201–217.
- Carolyn Saund and Stacy Marsella. 2021. The Importance of Qualitative Elements in Subjective Evaluation of Semantic Gestures. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 1–8.
- Multiple metaphors in metaphoric gesturing. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 524–530.
- Susan Seizer. 2011. On the Uses of Obscenity in Live Stand-Up Comedy. Anthropological Quarterly 84, 1 (2011), 209–234. https://www.jstor.org/stable/41237487 Publisher: The George Washington University Institute for Ethnographic Research.
- The Exploration of the Uncanny Valley from the Viewpoint of the Robot’s Nonverbal Behaviour. International Journal of Social Robotics 13, 6 (Sept. 2021), 1443–1455. https://doi.org/10.1007/s12369-020-00726-w
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Barbara Tversky and Bridgette Martin Hard. 2009. Embodied and disembodied cognition: Spatial perspective-taking. Cognition 110, 1 (2009), 124–129.
- To rate or not to rate: Investigating evaluation methods for generated co-speech gestures. In Proceedings of the 2021 International Conference on Multimodal Interaction. 494–502.
- An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080 (2021).
- Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. arXiv preprint arXiv:2304.13712 (2023).
- Speech gesture generation from the trimodal context of text, audio, and speaker identity. ACM Transactions on Graphics 39, 6 (Dec. 2020), 1–16. https://doi.org/10.1145/3414685.3417838
- Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 4303–4309.