Expressive Speech-driven Facial Animation with controllable emotions (2301.02008v2)
Abstract: It is in high demand to generate facial animation with high realism, but it remains a challenging task. Existing approaches of speech-driven facial animation can produce satisfactory mouth movement and lip synchronization, but show weakness in dramatic emotional expressions and flexibility in emotion control. This paper presents a novel deep learning-based approach for expressive facial animation generation from speech that can exhibit wide-spectrum facial expressions with controllable emotion type and intensity. We propose an emotion controller module to learn the relationship between the emotion variations (e.g., types and intensity) and the corresponding facial expression parameters. It enables emotion-controllable facial animation, where the target expression can be continuously adjusted as desired. The qualitative and quantitative evaluations show that the animation generated by our method is rich in facial emotional expressiveness while retaining accurate lip movement, outperforming other state-of-the-art methods.
- “Visemenet: Audio-driven animator-centric speech animation,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–10, 2018.
- “Faceformer: Speech-driven 3d facial animation with transformers,” in CVPR, pp. 18770–18780.
- “A deep learning approach for generalized speech animation,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–11, 2017.
- “Audio2face: Generating speech/face animation from single audio with attention-based bidirectional lstm networks,” in 2019 IEEE international conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2019, pp. 366–371.
- “Capture, learning, and synthesis of 3d speaking styles,” in CVPR, 2019, pp. 10101–10111.
- “Meshtalk: 3d face animation from speech using cross-modality disentanglement,” in ICCV, 2021, pp. 1173–1182.
- “Audio-driven facial animation by joint end-to-end learning of pose and emotion,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–12, 2017.
- “Speech-driven 3d facial animation with implicit emotional awareness: a deep learning approach,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 80–88.
- “Deep video portraits,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018.
- “Real-time facial animation with image-based dynamic avatars,” ACM Transactions on Graphics, vol. 35, no. 4, 2016.
- “Vr facial animation via multiview image translation,” ACM Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–16, 2019.
- “Face2face: Real-time face capture and reenactment of rgb videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387–2395.
- “Photorealistic audio-driven video portraits,” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 12, pp. 3457–3466, 2020.
- “Neural voice puppetry: Audio-driven facial reenactment,” in European conference on computer vision. Springer, 2020, pp. 716–731.
- “Audio-driven emotional video portraits,” in CVPR, 2021, pp. 14080–14089.
- “Audio-driven talking face video generation with learning-based personalized head pose,” arXiv preprint arXiv:2002.10137, 2020.
- “Synthesizing obama: learning lip sync from audio,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–13, 2017.
- “Jali: an animator-centric viseme model for expressive lip synchronization,” ACM Transactions on graphics (TOG), vol. 35, no. 4, pp. 1–11, 2016.
- “Geometry-guided dense perspective network for speech-driven facial animation,” IEEE Transactions on Visualization and Computer Graphics, 2021.
- “Emotion guided speech-driven facial animation,” in SIGGRAPH Asia 2021 Posters, pp. 1–2. 2021.
- “End-to-end learning for 3d facial animation from raw waveforms of speech,” arXiv preprint arXiv:1710.00920, 2017.
- “Learning a model of facial shape and expression from 4d scans.,” ACM Trans. Graph., vol. 36, no. 6, pp. 194–1, 2017.
- “Distract your attention: Multi-head cross attention network for facial expression recognition,” arXiv preprint arXiv:2109.07270, 2021.
- “Crema-d: Crowd-sourced emotional multimodal actors dataset,” IEEE transactions on affective computing, vol. 5, no. 4, pp. 377–390, 2014.
- “Emoca: Emotion driven monocular face capture and animation,” in CVPR, 2022, pp. 20311–20322.
- “Deep audio-visual speech recognition,” IEEE transactions on pattern analysis and machine intelligence, 2018.
- Yutong Chen (30 papers)
- Junhong Zhao (6 papers)
- Wei-Qiang Zhang (37 papers)