Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting (2312.02902v2)

Published 5 Dec 2023 in cs.CV

Abstract: 3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by the advances in differentiable rendering and neural radiance fields. Real-time rendering is a highly desirable goal for real-world applications. We propose HeadGaS, a model that uses 3D Gaussian Splats (3DGS) for 3D head reconstruction and animation. In this paper we introduce a hybrid model that extends the explicit 3DGS representation with a base of learnable latent features, which can be linearly blended with low-dimensional parameters from parametric head models to obtain expression-dependent color and opacity values. We demonstrate that HeadGaS delivers state-of-the-art results in real-time inference frame rates, surpassing baselines by up to 2dB, while accelerating rendering speed by over x10.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  2. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022.
  3. Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
  4. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, page 187–194, USA, 1999. ACM Press/Addison-Wesley Publishing Co.
  5. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, 2014.
  6. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV), 2022.
  7. Animatable neural radiance fields from monocular rgb videos. ArXiv, abs/2106.13629, 2021.
  8. Depth-supervised NeRF: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  9. Neural radiance flow for 4d view synthesis and video processing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
  10. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8649–8658, 2021.
  11. Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), 41(6), 2022.
  12. Automatic face reenactment. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
  13. Neural head avatars from monocular rgb videos. CVPR, 2022.
  14. Headnerf: A real-time nerf-based parametric head model. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  15. Vschh 2023: A benchmark for the view synthesis challenge of human heads. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023.
  16. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, 2016.
  17. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
  18. Deep video portraits. ACM Transactions on Graphics (TOG), 37(4):163, 2018.
  19. Nersemble: Multi-view radiance field reconstruction of human heads. ACM Trans. Graph., 42(4), 2023.
  20. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017.
  21. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  22. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv, 2308.09713, 2023.
  23. KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints. In European conference on computer vision, 2022.
  24. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  25. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  26. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.
  27. Deepsdf: Learning continuous signed distance functions for shape representation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  28. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021a.
  29. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), 2021b.
  30. Animatable neural radiance fields for modeling dynamic human bodies. In ICCV, 2021.
  31. D-nerf: Neural radiance fields for dynamic scenes. CVPR, 2020.
  32. Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
  33. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  34. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  35. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
  36. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In IEEE International Conference on Computer Vision (ICCV). IEEE, 2021.
  37. Sparf: Neural radiance fields from sparse and noisy poses. IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023.
  38. Morf: Morphable radiance fields for multiview neural head modeling. In SIGGRAPH ’22: Special Interest Group on Computer Graphics and Interactive Techniques Conference. ACM, 2022.
  39. HumanNeRF: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16210–16220, 2022.
  40. 4d gaussian splatting for real-time dynamic scene rendering. arXiv, 2310.08528, 2023.
  41. Empirical evaluation of rectified activations in convolutional network. 2015.
  42. Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels. In ACM SIGGRAPH 2023 Conference Proceedings, 2023.
  43. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv, 2309.13101, 2023.
  44. Differentiable surface splatting for point-based geometry processing. ACM Transactions on Graphics (proceedings of ACM SIGGRAPH ASIA), 38(6), 2019.
  45. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  46. I M Avatar: Implicit morphable head avatars from videos. In Computer Vision and Pattern Recognition (CVPR), 2022.
  47. Pointavatar: Deformable point-based head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  48. Instant volumetric head avatars. In Conference on Computer Vision and Pattern Recognition, 2023.
Citations (19)

Summary

  • The paper introduces HeadGaS, the first method using 3D Gaussian splatting for real-time, controllable 3D head avatar reconstruction and animation.
  • It blends learned features with traditional parametric head models to dynamically render expression-dependent colors and opacities at speeds surpassing 100 fps.
  • Extensive evaluations demonstrate up to 2dB image quality improvements and significant speedups over neural radiance field baselines, enabling practical VR and AR applications.

Understanding HeadGaS: Animating 3D Head Avatars in Real-Time with Gaussian Splatting

Creating realistic and controllable 3D head avatars has significant applications in virtual reality (VR), augmented reality (AR), teleconferencing, and gaming. The goal of achieving photorealism while ensuring expressive control has been a long-standing challenge in computer graphics and vision research. However, a novel approach called HeadGaS (Head Gaussian Splatting) has made a remarkable stride forward.

HeadGaS leverages a concept known as 3D Gaussian Splats (3DGS), an efficient spatial representation that allows for rapid rendering speeds. The pioneering work behind HeadGaS is the first to utilize 3DGS for both the reconstruction and animation of 3D head avatars, bringing them to life in real-time execution.

Core Principles of HeadGaS

At its essence, HeadGaS is a hybrid model that blends learned features derived from a given data set with low-dimensional parameters from traditional parametric head morphable models (like FLAME and FaceWarehouse). These parameters serve as the driving force behind the expression-dependent coloring and opacity qualities of the avatars. The significant outcome is that HeadGaS can produce remarkably accurate and controllable avatars, surpassing existing methods both in rendering speed (over 100 frames per second) and visual quality.

The secret sauce lies in HeadGaS's feature blending method. By incorporating a learnable latent feature base within each Gaussian primitive, these features are dynamically weighted by expression vectors, leading to frame-specific rendering of avatars with varying expressions. These per-frame features then pass through a multi-layer perceptron (MLP) to output the final color and opacity for rendering. This process is both flexible and efficient, making it compatible with any 3D morphable model.

Real-Time Performance and Quality

HeadGaS not only stands out for its high-quality visual output but also for its exceptional rendering speed. In detailed experiments, it has achieved rendering speeds up to an astounding 200 fps for a 512 resolution. This marks an improvement in speed by at least tenfold compared to interactive neural radiance field-based baselines, a major victory for real-time applications.

Extensive Evaluation and Practical Applications

In a thorough examination against several benchmarks, HeadGaS consistently demonstrated superior results, up to 2dB improvement in image quality metrics and significant reductions in rendering times. Its applications are broad, ranging from producing novel views of the same person to cross-subject expression transfer and beyond.

Broad Implications and Potential

The implications of HeadGaS are broad and transformative for the digital world. Its ability to efficiently generate authentic and expressive human avatars holds promise for interactive digital experiences, while also extending the boundaries of visualization technologies. The HeadGaS model is a testament to the power of combining advanced neural techniques with efficient spatial representations, setting a new standard in the realistic animation of digital human avatars.

X Twitter Logo Streamline Icon: https://streamlinehq.com