Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NPGA: Neural Parametric Gaussian Avatars (2405.19331v2)

Published 29 May 2024 in cs.CV, cs.AI, and cs.GR

Abstract: The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. For increased representational capacity of our avatars, we propose per-Gaussian latent features that condition each primitives dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models. arXiv:2312.08459 [cs.CV]
  2. RigNeRF: Fully Controllable Neural 3D Portraits. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 20364–20373.
  3. HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling. In Conference on Computer Vision and Pattern Recognition (CVPR).
  4. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187–194.
  5. Revising Densification in Gaussian Splatting. arXiv:2404.06109 [cs.CV]
  6. Authentic Volumetric Avatars from a Phone Scan. ACM Trans. Graph. 41, 4, Article 163 (jul 2022), 19 pages. https://doi.org/10.1145/3528223.3530143
  7. Efficient Geometry-aware 3D Generative Adversarial Networks. In CVPR.
  8. Fast-SNARF: A Fast Deformer for Articulated Neural Fields. Pattern Analysis and Machine Intelligence (PAMI) (2023).
  9. SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes. In International Conference on Computer Vision (ICCV).
  10. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8649–8658.
  11. Learning Neural Parametric Head Models. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
  12. MonoNPHM: Dynamic Head Reconstruction from Monocular Videos. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
  13. Neural head avatars from monocular RGB videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18653–18664.
  14. Humanrf: High-fidelity neural radiance fields for humans in motion. arXiv preprint arXiv:2305.06356 (2023).
  15. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (July 2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
  16. Deep Video Portraits. ACM Transactions on Graphics (TOG) 37, 4 (2018), 163.
  17. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. arXiv:http://arxiv.org/abs/1312.6114v10 [stat.ML]
  18. DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars. arXiv preprint arXiv:2311.18635 (2023).
  19. NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads. ACM Trans. Graph. 42, 4, Article 161 (jul 2023), 14 pages. https://doi.org/10.1145/3592455
  20. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (2017), 194:1–194:17. https://doi.org/10.1145/3130800.3130813
  21. Neural 3D Video Synthesis From Multi-View Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5521–5531.
  22. Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  23. Neural Volumes: Learning Dynamic Renderable Volumes from Images. ACM Trans. Graph. 38, 4, Article 65 (July 2019), 14 pages.
  24. Mixture of Volumetric Primitives for Efficient Neural Rendering. ACM Trans. Graph. 40, 4, Article 59 (jul 2021), 13 pages. https://doi.org/10.1145/3450626.3459863
  25. Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. In 3DV.
  26. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  27. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph. 41, 4, Article 102 (July 2022), 15 pages. https://doi.org/10.1145/3528223.3530127
  28. RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars. Advances in Neural Information Processing Systems 36 (2024).
  29. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5865–5874.
  30. HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields. ACM Trans. Graph. 40, 6, Article 238 (dec 2021).
  31. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 10975–10985.
  32. A 3D face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance. Ieee, 296–301.
  33. Convolutional Occupancy Networks. In European Conference on Computer Vision (ECCV).
  34. GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians. arXiv preprint arXiv:2312.02069 (2023).
  35. Relightable Gaussian Codec Avatars. In CVPR.
  36. Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).
  37. Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  38. NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields. IEEE Transactions on Visualization and Computer Graphics 29, 5 (2023), 2732–2742. https://doi.org/10.1109/TVCG.2023.3247082
  39. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.
  40. Multiface: A Dataset for Neural Face Rendering. In arXiv. https://doi.org/10.48550/ARXIV.2207.11243
  41. Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  42. LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar. In ACM SIGGRAPH 2023 Conference Proceedings.
  43. Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction. arXiv preprint arXiv:2309.13101 (2023).
  44. i3DMM: Deep Implicit 3D Morphable Model of Human Heads. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12803–12813.
  45. HACK: Learning a Parametric Head and Neck Model for High-Fidelity Animation. ACM Trans. Graph. 42, 4, Article 41 (jul 2023), 20 pages. https://doi.org/10.1145/3592093
  46. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
  47. ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  48. General facial representation learning in a visual-linguistic manner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18697–18709.
  49. PointAvatar: Deformable Point-based Head Avatars from Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  50. Drivable 3D Gaussian Avatars. (2023). arXiv:2311.08581 [cs.CV]
  51. Instant Volumetric Head Avatars. arXiv:2211.12499 [cs.CV]
Citations (3)

Summary

  • The paper introduces NPGA to create realistic digital avatars using neural parametric models and 3D Gaussian splatting for enhanced control.
  • It employs a canonical Gaussian point cloud with dual MLP modules to capture both coarse and fine dynamic expressions.
  • Evaluation on the NeRSemble dataset demonstrates significant improvements in PSNR, SSIM, and LPIPS metrics over traditional avatar methods.

NPGA: Neural Parametric Gaussian Avatars

In the paper "NPGA: Neural Parametric Gaussian Avatars," the authors present a method for creating high-fidelity, controllable digital avatars. The approach harnesses multi-view video recordings to enable seamless integration of virtual avatars into various applications, including AR/VR, teleconferencing, and digital media.

This effort is driven by the inherent challenges in creating realistic avatars, such as ensuring photo-realism and achieving real-time rendering. The authors introduce the Neural Parametric Gaussian Avatars (NPGA), which leverage 3D Gaussian splatting for efficient rendering and introduce neural parametric head models (NPHM) to condition avatar dynamics. This method diverges from traditional 3D morphable models (3DMMs) that are mesh-based and limited by their linear nature. Instead, NPGA capitalizes on NPHM to capture a broader expression space with more nuanced dynamic behavior.

Methodology

The proposed method is built around a canonical Gaussian point cloud augmented with per-primitive latent features. These features govern the dynamic behavior of the avatars, providing enriched representation capabilities. The dynamics module, a key component of NPGA, consists of two Multi-Layer Perceptrons (MLPs). The network FF is responsible for handling coarse, prior-based deformation, while the network GG captures finer details beyond this prior.

A novel strategy called cycle-consistency distillation is employed to convert the backward deformations inherent in NPHM to forward deformations, making them compatible with rasterization-based rendering. This technique optimizes the network FF to act as the inverse of the NPHM backward deformation, ensuring that the facial dynamics remain aligned with the neural parametric model.

Implementation and Evaluation

The authors evaluate their approach on the NeRSemble dataset, demonstrating significant enhancements over existing methods. NPGA outperforms traditional GaussianAvatar and GaussianHeadAvatar models on self-reenactment tasks by achieving approximately 2.6 PSNR improvement and notable gains in SSIM and LPIPS metrics. Additionally, NPGA exhibits robust performance in cross-reenactment scenarios and demonstrates the feasibility of avatar animation using monocular RGB tracking in real-world conditions.

Results

The evaluation highlights NPGA’s capacity for creating avatars with higher fidelity and nuanced dynamic expressions, outperforming baselines in both qualitative and quantitative measures. For instance, NPGA achieves an average PSNR of 37.68 compared to 33.92 (GHA) and 33.42 (MVP) in novel view synthesis tasks. These improvements are a testament to the effective integration of per-primitive features and the cycle-consistency approach.

Implications and Future Work

The implications of this research are significant for the future development of digital avatars and related technologies. By leveraging a neural parametric model, NPGA provides a more expressive and controllable framework for avatar animation. This can foster advancements in immersive applications spanning gaming, virtual environments, and telepresence.

Moving forward, the authors suggest extending the underlying 3DMMs to encompass more comprehensive descriptions, including the neck and torso, which are currently inadequately represented. Additionally, there is potential for adopting large-scale multi-view datasets to further enhance the fidelity and generalization of neural models used in avatar creation.

In summary, "NPGA: Neural Parametric Gaussian Avatars" offers a compelling solution to the challenge of creating high-fidelity digital avatars, integrating efficient rendering techniques with advanced neural parametric models to achieve superior dynamic expressivity and visual realism. The approach sets a new benchmark in the quest for responsive and lifelike virtual human representations.

Youtube Logo Streamline Icon: https://streamlinehq.com