Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding (2305.12887v1)

Published 22 May 2023 in eess.AS, cs.AI, cs.LG, and cs.SD

Abstract: In this study, we address the importance of modeling behavior style in virtual agents for personalized human-agent interaction. We propose a machine learning approach to synthesize gestures, driven by prosodic features and text, in the style of different speakers, even those unseen during training. Our model incorporates zero-shot multimodal style transfer using multimodal data from the PATS database, which contains videos of diverse speakers. We recognize style as a pervasive element during speech, influencing the expressivity of communicative behaviors, while content is conveyed through multimodal signals and text. By disentangling content and style, we directly infer the style embedding, even for speakers not included in the training phase, without the need for additional training or fine-tuning. Objective and subjective evaluations are conducted to validate our approach and compare it against two baseline methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mireille Fares (7 papers)
  2. Catherine Pelachaud (21 papers)
  3. Nicolas Obin (16 papers)

Summary

We haven't generated a summary for this paper yet.