Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model (2308.15016v1)

Published 29 Aug 2023 in cs.CV

Abstract: Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal latent information and applying practical controlling, we propose a Controllable Co-speech Gesture Generation framework, named C2G2. Specifically, we propose a two-stage temporal dependency enhancement strategy motivated by latent diffusion models. We further introduce two key features to C2G2, namely a speaker-specific decoder to generate speaker-related real-length skeletons and a repainting strategy for flexible gesture generation/editing. Extensive experiments on benchmark gesture datasets verify the effectiveness of our proposed C2G2 compared with several state-of-the-art baselines. The link of the project demo page can be found at https://c2g2-gesture.github.io/c2_gesture

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Longbin Ji (2 papers)
  2. Pengfei Wei (21 papers)
  3. Yi Ren (215 papers)
  4. Jinglin Liu (38 papers)
  5. Chen Zhang (403 papers)
  6. Xiang Yin (99 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.