Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VisemeNet: Audio-Driven Animator-Centric Speech Animation (1805.09488v1)

Published 24 May 2018 in cs.GR

Abstract: We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yang Zhou (311 papers)
  2. Zhan Xu (24 papers)
  3. Chris Landreth (2 papers)
  4. Evangelos Kalogerakis (44 papers)
  5. Subhransu Maji (78 papers)
  6. Karan Singh (58 papers)
Citations (140)

Summary

We haven't generated a summary for this paper yet.