Video-Driven Animation of Neural Head Avatars (2403.04380v1)
Abstract: We present a new approach for video-driven animation of high-quality neural 3D head models, addressing the challenge of person-independent animation from video input. Typically, high-quality generative models are learned for specific individuals from multi-view video footage, resulting in person-specific latent representations that drive the generation process. In order to achieve person-independent animation from video input, we introduce an LSTM-based animation network capable of translating person-independent expression features into personalized animation parameters of person-specific 3D head models. Our approach combines the advantages of personalized head models (high quality and realism) with the convenience of video-driven animation employing multi-person facial performance capture. We demonstrate the effectiveness of our approach on synthesized animations with high quality based on different source videos as well as an ablation study.
- “Deep Relightable Appearance Models for Animatable Faces” In ACM Trans. Graph. 40.4 New York, NY, USA: Association for Computing Machinery, 2021
- “Learning Complete 3D Morphable Face Models from Images and Videos” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021
- “A Morphable Model for the Synthesis of 3D Faces” In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’99 New York, NY, USA: ACM Press/Addison-Wesley Publishing Co., 1999, pp. 187–194
- “Semantic Deep Face Models” In Proc. International Conference on 3D Vision (3DV), 2020, pp. 345–354 DOI: 10.1109/3DV50981.2020.00044
- “Expression-Aware Face Reconstruction Via A Dual-Stream Network” In IEEE International Conference on Multimedia and Expo, ICME 2020, London, UK, July 6-10, 2020 IEEE, 2020, pp. 1–6
- J.S. Chung, A. Nagrani and A. Zisserman “VoxCeleb2: Deep Speaker Recognition” In INTERSPEECH, 2018
- “FaceWarehouse: A 3D Facial Expression Database for Visual Computing” In IEEE Transactions on Visualization and Computer Graphics 20.3 Piscataway, NJ, USA: IEEE Educational Activities Department, 2014, pp. 413–425
- Radek Danecek, Michael J. Black and Timo Bolkart “EMOCA: Emotion Driven Monocular Face Capture and Animation” In Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 20311–20322
- “MegaPortraits: One-shot Megapixel Neural Head Avatars” In Proceedings of the 30th ACM International Conference on Multimedia, 2022
- “Video Face Replacement” In ACM Transactions on Graphics (Proc. SIGGRAPH Asia) 30, 2011
- “Learning an Animatable Detailed 3D Face Model from In-The-Wild Images” In ACM Transactions on Graphics, (Proc. SIGGRAPH) 40.8, 2021
- “Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos” In arXiv preprint arXiv:2207.11094 arXiv, 2022
- “Neural head avatars from monocular RGB videos” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18653–18664
- “Generative Adversarial Nets” In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14 Montreal, Canada: MIT Press, 2014, pp. 2672–2680
- “Warp-guided GANs for Single-photo Facial Animation” In ACM Trans. Graph. 37.6 New York, NY, USA: Acm, 2018, pp. 231:1–231:12 DOI: 10.1145/3272127.3275043
- “Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction” In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021
- “Reconstructing Detailed Dynamic Face Geometry from Monocular Video” In ACM Trans. Graph. 32.6 New York, NY, USA: ACM, 2013, pp. 158:1–158:10
- “Marionette: Few-shot face reenactment preserving identity of unseen targets” In Proceedings of the AAAI Conference on Artificial Intelligence 34.07, 2020, pp. 10893–10900
- “Baking Neural Radiance Fields for Real-Time View Synthesis” In arXiv, 2021
- “Depth-Aware Generative Adversarial Network for Talking Head Video Generation” In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
- “Image-to-Image Translation with Conditional Adversarial Networks” In arxiv, 2016
- “Deep Video Portraits” In ACM Trans. Graph. 37.4 New York, NY, USA: Association for Computing Machinery, 2018
- “One millisecond face alignment with an ensemble of regression trees” In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1867–1874
- Diederik P Kingma and Max Welling “Auto-Encoding Variational Bayes”, 2013 arXiv:1312.6114 [stat.ML]
- “Learning a model of facial shape and expression from 4D scans” In ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36.6, 2017, pp. 194:1–194:17
- “Learning Formation of Physically-Based Face Attributes” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020 IEEE, 2020, pp. 3407–3416
- “Neural Volumes: Learning Dynamic Renderable Volumes from Images” In ACM Trans. Graph. 38.4 New York, NY, USA: Association for Computing Machinery, 2019
- “Deep Appearance Models for Face Rendering” In CoRR abs/1808.00362, 2018
- “Realtime Facial Animation with On-the-fly Correctives” In ACM Trans. Graph. 32.4 New York, NY, USA: ACM, 2013, pp. 42:1–42:10
- “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding” In ACM Trans. Graph. 41.4 New York, NY, USA: ACM, 2022, pp. 102:1–102:15
- “Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines” In ACM Trans. Graph. 38.4 New York, NY, USA: Association for Computing Machinery, 2019
- “Pixel Codec Avatars” In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 2021, pp. 64–73
- “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis” In ECCV, 2020
- Sergey Prokudin, Michael J. Black and Javier Romero “SMPLpix: Neural Avatars from 3D Human Models” In Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 1810–1819
- “D-NeRF: Neural Radiance Fields for Dynamic Scenes” In arXiv preprint arXiv:2011.13961, 2020
- Wolfgang Paier, Anna Hilsmann and Peter Eisert “Interactive Facial Animation with Deep Neural Networks” In IET Computer Vision, Special Issue on Computer Vision for the Creative Industries 14.6, 2020, pp. 359–369
- Wolfgang Paier, Anna Hilsmann and Peter Eisert “Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances”, 2023 arXiv:2306.10006 [cs.CV]
- “A Hybrid Approach for Facial Performance Analysis and Editing” In IEEE Trans. on Circuits and Systems for Video Technology 27.4 Piscataway, NJ, USA: IEEE Press, 2017, pp. 784–797
- O. Ronneberger, P.Fischer and T. Brox “U-Net: Convolutional Networks for Biomedical Image Segmentation” (available on arXiv:1505.04597 [cs.CV]) In Medical Image Computing and Computer-Assisted Intervention (MICCAI) 9351, LNCS Springer, 2015, pp. 234–241
- “KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs” In CoRR abs/2103.13744, 2021
- “First Order Motion Model for Image Animation” In Conference on Neural Information Processing Systems (NeurIPS), 2019
- Supasorn Suwajanakorn, Steven M. Seitz and Ira Kemelmacher-Shlizerman “Synthesizing Obama: Learning Lip Sync from Audio” In ACM Trans. Graph. 36.4 New York, NY, USA: Association for Computing Machinery, 2017
- Vincent Sitzmann, Michael Zollhoefer and Gordon Wetzstein “Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019
- “State of the Art on Neural Rendering” In Computer Graphics Forum (EG STAR 2020), 2020
- “MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction” In The IEEE International Conference on Computer Vision (ICCV), 2017
- “Real-time Expression Transfer for Facial Reenactment” In ACM Transactions on Graphics (TOG) 34.6 ACM, 2015
- Justus Thies, Michael Zollhöfer and Matthias Nießner “Deferred neural rendering: image synthesis using neural textures” In ACM Trans. Graph. 38.4, 2019, pp. 66:1–66:12
- “Face Transfer with Multilinear Models” In ACM Trans. Graph. 24.3 New York, NY, USA: ACM, 2005, pp. 426–433
- “Realtime Performance-Based Facial Animation” In ACM Trans. Graph. 30.4 New York, NY, USA: Association for Computing Machinery, 2011
- Ting-Chun Wang, Arun Mallya and Ming-Yu Liu “One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
- “MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation” In ECCV, 2020
- “Latent Image Animator: Learning to Animate Images via Latent Space Navigation” In International Conference on Learning Representations, 2022
- “PlenOctrees for Real-time Rendering of Neural Radiance Fields” In ICCV, 2021
- “Face Animation with an Attribute-Guided Diffusion Model”, 2023 arXiv:2304.03199 [cs.CV]
- “Realistic face reenactment via self-supervised disentangling of identity and pose” In Proceedings of the AAAI Conference on Artificial Intelligence 34.07, 2020, pp. 12757–12764