Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimodal analysis of the predictability of hand-gesture properties (2108.05762v3)

Published 12 Aug 2021 in cs.HC, cs.LG, and cs.MM

Abstract: Embodied conversational agents benefit from being able to accompany their speech with gestures. Although many data-driven approaches to gesture generation have been proposed in recent years, it is still unclear whether such systems can consistently generate gestures that convey meaning. We investigate which gesture properties (phase, category, and semantics) can be predicted from speech text and/or audio using contemporary deep learning. In extensive experiments, we show that gesture properties related to gesture meaning (semantics and category) are predictable from text features (time-aligned FastText embeddings) alone, but not from prosodic audio features, while rhythm-related gesture properties (phase) on the other hand can be predicted from audio features better than from text. These results are encouraging as they indicate that it is possible to equip an embodied agent with content-wise meaningful co-speech gestures using a machine-learning model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Taras Kucherenko (21 papers)
  2. Rajmund Nagy (6 papers)
  3. Michael Neff (8 papers)
  4. Gustav Eje Henter (51 papers)
  5. Hedvig Kjellström (47 papers)
Citations (22)

Summary

We haven't generated a summary for this paper yet.