Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Video-Driven Animation of Neural Head Avatars (2403.04380v1)

Published 7 Mar 2024 in cs.CV

Abstract: We present a new approach for video-driven animation of high-quality neural 3D head models, addressing the challenge of person-independent animation from video input. Typically, high-quality generative models are learned for specific individuals from multi-view video footage, resulting in person-specific latent representations that drive the generation process. In order to achieve person-independent animation from video input, we introduce an LSTM-based animation network capable of translating person-independent expression features into personalized animation parameters of person-specific 3D head models. Our approach combines the advantages of personalized head models (high quality and realism) with the convenience of video-driven animation employing multi-person facial performance capture. We demonstrate the effectiveness of our approach on synthesized animations with high quality based on different source videos as well as an ablation study.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. “Deep Relightable Appearance Models for Animatable Faces” In ACM Trans. Graph. 40.4 New York, NY, USA: Association for Computing Machinery, 2021
  2. “Learning Complete 3D Morphable Face Models from Images and Videos” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021
  3. “A Morphable Model for the Synthesis of 3D Faces” In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’99 New York, NY, USA: ACM Press/Addison-Wesley Publishing Co., 1999, pp. 187–194
  4. “Semantic Deep Face Models” In Proc. International Conference on 3D Vision (3DV), 2020, pp. 345–354 DOI: 10.1109/3DV50981.2020.00044
  5. “Expression-Aware Face Reconstruction Via A Dual-Stream Network” In IEEE International Conference on Multimedia and Expo, ICME 2020, London, UK, July 6-10, 2020 IEEE, 2020, pp. 1–6
  6. J.S. Chung, A. Nagrani and A. Zisserman “VoxCeleb2: Deep Speaker Recognition” In INTERSPEECH, 2018
  7. “FaceWarehouse: A 3D Facial Expression Database for Visual Computing” In IEEE Transactions on Visualization and Computer Graphics 20.3 Piscataway, NJ, USA: IEEE Educational Activities Department, 2014, pp. 413–425
  8. Radek Danecek, Michael J. Black and Timo Bolkart “EMOCA: Emotion Driven Monocular Face Capture and Animation” In Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 20311–20322
  9. “MegaPortraits: One-shot Megapixel Neural Head Avatars” In Proceedings of the 30th ACM International Conference on Multimedia, 2022
  10. “Video Face Replacement” In ACM Transactions on Graphics (Proc. SIGGRAPH Asia) 30, 2011
  11. “Learning an Animatable Detailed 3D Face Model from In-The-Wild Images” In ACM Transactions on Graphics, (Proc. SIGGRAPH) 40.8, 2021
  12. “Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos” In arXiv preprint arXiv:2207.11094 arXiv, 2022
  13. “Neural head avatars from monocular RGB videos” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18653–18664
  14. “Generative Adversarial Nets” In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14 Montreal, Canada: MIT Press, 2014, pp. 2672–2680
  15. “Warp-guided GANs for Single-photo Facial Animation” In ACM Trans. Graph. 37.6 New York, NY, USA: Acm, 2018, pp. 231:1–231:12 DOI: 10.1145/3272127.3275043
  16. “Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction” In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021
  17. “Reconstructing Detailed Dynamic Face Geometry from Monocular Video” In ACM Trans. Graph. 32.6 New York, NY, USA: ACM, 2013, pp. 158:1–158:10
  18. “Marionette: Few-shot face reenactment preserving identity of unseen targets” In Proceedings of the AAAI Conference on Artificial Intelligence 34.07, 2020, pp. 10893–10900
  19. “Baking Neural Radiance Fields for Real-Time View Synthesis” In arXiv, 2021
  20. “Depth-Aware Generative Adversarial Network for Talking Head Video Generation” In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
  21. “Image-to-Image Translation with Conditional Adversarial Networks” In arxiv, 2016
  22. “Deep Video Portraits” In ACM Trans. Graph. 37.4 New York, NY, USA: Association for Computing Machinery, 2018
  23. “One millisecond face alignment with an ensemble of regression trees” In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1867–1874
  24. Diederik P Kingma and Max Welling “Auto-Encoding Variational Bayes”, 2013 arXiv:1312.6114 [stat.ML]
  25. “Learning a model of facial shape and expression from 4D scans” In ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36.6, 2017, pp. 194:1–194:17
  26. “Learning Formation of Physically-Based Face Attributes” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020 IEEE, 2020, pp. 3407–3416
  27. “Neural Volumes: Learning Dynamic Renderable Volumes from Images” In ACM Trans. Graph. 38.4 New York, NY, USA: Association for Computing Machinery, 2019
  28. “Deep Appearance Models for Face Rendering” In CoRR abs/1808.00362, 2018
  29. “Realtime Facial Animation with On-the-fly Correctives” In ACM Trans. Graph. 32.4 New York, NY, USA: ACM, 2013, pp. 42:1–42:10
  30. “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding” In ACM Trans. Graph. 41.4 New York, NY, USA: ACM, 2022, pp. 102:1–102:15
  31. “Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines” In ACM Trans. Graph. 38.4 New York, NY, USA: Association for Computing Machinery, 2019
  32. “Pixel Codec Avatars” In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 2021, pp. 64–73
  33. “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis” In ECCV, 2020
  34. Sergey Prokudin, Michael J. Black and Javier Romero “SMPLpix: Neural Avatars from 3D Human Models” In Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 1810–1819
  35. “D-NeRF: Neural Radiance Fields for Dynamic Scenes” In arXiv preprint arXiv:2011.13961, 2020
  36. Wolfgang Paier, Anna Hilsmann and Peter Eisert “Interactive Facial Animation with Deep Neural Networks” In IET Computer Vision, Special Issue on Computer Vision for the Creative Industries 14.6, 2020, pp. 359–369
  37. Wolfgang Paier, Anna Hilsmann and Peter Eisert “Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances”, 2023 arXiv:2306.10006 [cs.CV]
  38. “A Hybrid Approach for Facial Performance Analysis and Editing” In IEEE Trans. on Circuits and Systems for Video Technology 27.4 Piscataway, NJ, USA: IEEE Press, 2017, pp. 784–797
  39. O. Ronneberger, P.Fischer and T. Brox “U-Net: Convolutional Networks for Biomedical Image Segmentation” (available on arXiv:1505.04597 [cs.CV]) In Medical Image Computing and Computer-Assisted Intervention (MICCAI) 9351, LNCS Springer, 2015, pp. 234–241
  40. “KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs” In CoRR abs/2103.13744, 2021
  41. “First Order Motion Model for Image Animation” In Conference on Neural Information Processing Systems (NeurIPS), 2019
  42. Supasorn Suwajanakorn, Steven M. Seitz and Ira Kemelmacher-Shlizerman “Synthesizing Obama: Learning Lip Sync from Audio” In ACM Trans. Graph. 36.4 New York, NY, USA: Association for Computing Machinery, 2017
  43. Vincent Sitzmann, Michael Zollhoefer and Gordon Wetzstein “Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019
  44. “State of the Art on Neural Rendering” In Computer Graphics Forum (EG STAR 2020), 2020
  45. “MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction” In The IEEE International Conference on Computer Vision (ICCV), 2017
  46. “Real-time Expression Transfer for Facial Reenactment” In ACM Transactions on Graphics (TOG) 34.6 ACM, 2015
  47. Justus Thies, Michael Zollhöfer and Matthias Nießner “Deferred neural rendering: image synthesis using neural textures” In ACM Trans. Graph. 38.4, 2019, pp. 66:1–66:12
  48. “Face Transfer with Multilinear Models” In ACM Trans. Graph. 24.3 New York, NY, USA: ACM, 2005, pp. 426–433
  49. “Realtime Performance-Based Facial Animation” In ACM Trans. Graph. 30.4 New York, NY, USA: Association for Computing Machinery, 2011
  50. Ting-Chun Wang, Arun Mallya and Ming-Yu Liu “One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
  51. “MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation” In ECCV, 2020
  52. “Latent Image Animator: Learning to Animate Images via Latent Space Navigation” In International Conference on Learning Representations, 2022
  53. “PlenOctrees for Real-time Rendering of Neural Radiance Fields” In ICCV, 2021
  54. “Face Animation with an Attribute-Guided Diffusion Model”, 2023 arXiv:2304.03199 [cs.CV]
  55. “Realistic face reenactment via self-supervised disentangling of identity and pose” In Proceedings of the AAAI Conference on Artificial Intelligence 34.07, 2020, pp. 12757–12764

Summary

We haven't generated a summary for this paper yet.