NPGA: Neural Parametric Gaussian Avatars (2405.19331v2)
Abstract: The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. For increased representational capacity of our avatars, we propose per-Gaussian latent features that condition each primitives dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.
- FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models. arXiv:2312.08459 [cs.CV]
- RigNeRF: Fully Controllable Neural 3D Portraits. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 20364–20373.
- HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling. In Conference on Computer Vision and Pattern Recognition (CVPR).
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187–194.
- Revising Densification in Gaussian Splatting. arXiv:2404.06109 [cs.CV]
- Authentic Volumetric Avatars from a Phone Scan. ACM Trans. Graph. 41, 4, Article 163 (jul 2022), 19 pages. https://doi.org/10.1145/3528223.3530143
- Efficient Geometry-aware 3D Generative Adversarial Networks. In CVPR.
- Fast-SNARF: A Fast Deformer for Articulated Neural Fields. Pattern Analysis and Machine Intelligence (PAMI) (2023).
- SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes. In International Conference on Computer Vision (ICCV).
- Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8649–8658.
- Learning Neural Parametric Head Models. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
- MonoNPHM: Dynamic Head Reconstruction from Monocular Videos. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
- Neural head avatars from monocular RGB videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18653–18664.
- Humanrf: High-fidelity neural radiance fields for humans in motion. arXiv preprint arXiv:2305.06356 (2023).
- 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (July 2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
- Deep Video Portraits. ACM Transactions on Graphics (TOG) 37, 4 (2018), 163.
- Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. arXiv:http://arxiv.org/abs/1312.6114v10 [stat.ML]
- DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars. arXiv preprint arXiv:2311.18635 (2023).
- NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads. ACM Trans. Graph. 42, 4, Article 161 (jul 2023), 14 pages. https://doi.org/10.1145/3592455
- Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (2017), 194:1–194:17. https://doi.org/10.1145/3130800.3130813
- Neural 3D Video Synthesis From Multi-View Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5521–5531.
- Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Neural Volumes: Learning Dynamic Renderable Volumes from Images. ACM Trans. Graph. 38, 4, Article 65 (July 2019), 14 pages.
- Mixture of Volumetric Primitives for Efficient Neural Rendering. ACM Trans. Graph. 40, 4, Article 59 (jul 2021), 13 pages. https://doi.org/10.1145/3450626.3459863
- Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. In 3DV.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph. 41, 4, Article 102 (July 2022), 15 pages. https://doi.org/10.1145/3528223.3530127
- RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars. Advances in Neural Information Processing Systems 36 (2024).
- Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5865–5874.
- HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields. ACM Trans. Graph. 40, 6, Article 238 (dec 2021).
- Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 10975–10985.
- A 3D face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance. Ieee, 296–301.
- Convolutional Occupancy Networks. In European Conference on Computer Vision (ECCV).
- GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians. arXiv preprint arXiv:2312.02069 (2023).
- Relightable Gaussian Codec Avatars. In CVPR.
- Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).
- Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
- NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields. IEEE Transactions on Visualization and Computer Graphics 29, 5 (2023), 2732–2742. https://doi.org/10.1109/TVCG.2023.3247082
- Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.
- Multiface: A Dataset for Neural Face Rendering. In arXiv. https://doi.org/10.48550/ARXIV.2207.11243
- Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar. In ACM SIGGRAPH 2023 Conference Proceedings.
- Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction. arXiv preprint arXiv:2309.13101 (2023).
- i3DMM: Deep Implicit 3D Morphable Model of Human Heads. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12803–12813.
- HACK: Learning a Parametric Head and Neck Model for High-Fidelity Animation. ACM Trans. Graph. 42, 4, Article 41 (jul 2023), 20 pages. https://doi.org/10.1145/3592093
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
- ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- General facial representation learning in a visual-linguistic manner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18697–18709.
- PointAvatar: Deformable Point-based Head Avatars from Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Drivable 3D Gaussian Avatars. (2023). arXiv:2311.08581 [cs.CV]
- Instant Volumetric Head Avatars. arXiv:2211.12499 [cs.CV]