MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar (2312.04558v1)
Abstract: The ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between the virtual and real worlds. Recent advancements in head avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, and neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer from a heavy training burden due to the extensive quantity of points involved, and the last ones suffer from limitations in deformation flexibility and rendering efficiency. In response to these challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head Avatar), a novel approach that harnesses 3D Gaussian point representation coupled with a Gaussian deformation field to learn explicit head avatars from monocular portrait videos. We define our head avatars with Gaussian points characterized by adaptable shapes, enabling flexible topology. These points exhibit movement with a Gaussian deformation field in alignment with the target pose and expression of a person, facilitating efficient deformation. Additionally, the Gaussian points have controllable shape, size, color, and opacity combined with Gaussian splatting, allowing for efficient training and rendering. Experiments demonstrate the superior performance of our method, which achieves state-of-the-art results among previous methods.
- Rignerf: Fully controllable neural 3d portraits. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 20364–20373, 2022.
- Flame-in-nerf: Neural control of radiance fields for free view face animation. In IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–8, 2023.
- Learning personalized high quality volumetric head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16890–16900, 2023.
- High-frequency shape and albedo from shading using natural image statistics. In CVPR 2011, pages 2521–2528. IEEE, 2011.
- High-quality single-shot capture of facial geometry. ACM Trans. Graph., 29(4), 2010.
- Generative neural articulated radiance fields. Advances in Neural Information Processing Systems, 35:19900–19916, 2022.
- FLARE: Fast learning of animatable and relightable mesh avatars. ACM Transactions on Graphics, 42:15, 2023.
- A morphable model for the synthesis of 3d faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 157–164. 2023.
- High resolution passive facial performance capture. 29(4), 2010.
- Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5939–5948, 2019.
- Bakedavatar: Baking neural fields for real-time head avatar synthesis. arXiv preprint arXiv:2311.05521, 2023.
- Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021.
- Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), 41(6), 2022a.
- Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (TOG), 41(6):1–12, 2022b.
- Multiview face capture using polarized spherical gradient illumination. ACM Trans. Graph., 30(6):1–10, 2011.
- Neural head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18653–18664, 2022.
- Ground truth dataset and baseline evaluations for intrinsic image algorithms. In 2009 IEEE 12th International Conference on Computer Vision, pages 2335–2342. IEEE, 2009.
- Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5784–5794, 2021.
- Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
- 3d gaussian splatting for real-time radiance field rendering. TOG, 42(4):1–14, 2023.
- Realistic one-shot mesh-based head avatars. In European Conference of Computer vision (ECCV), 2022.
- The digital michelangelo project: 3d scanning of large statues. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, page 131–144, USA, 2000. ACM Press/Addison-Wesley Publishing Co.
- Learning a model of facial shape and expression from 4d scans. ACM Trans. Graph., 36(6):194–1, 2017.
- Semantic-aware implicit neural audio-driven video portrait generation. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
- Deep appearance models for face rendering. ACM Trans. Graph., 37(4):68:1–68:13, 2018.
- Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph., 40(4), 2021.
- Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
- Pixel codec avatars. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 64–73, 2021.
- Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
- A 3d face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance, pages 296–301. Ieee, 2009.
- Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
- Recovering intrinsic images from a single image. Advances in neural information processing systems, 15, 2002.
- Neural point-based volumetric avatar: Surface-guided neural points for efficient and photorealistic volumetric head avatar. In ACM SIGGRAPH Asia 2023 Conference Proceedings, 2023.
- 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
- Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels. In ACM SIGGRAPH 2023 Conference Proceedings, 2023a.
- Latentavatar: Learning latent expression code for expressive neural head avatar. In ACM SIGGRAPH 2023 Conference Proceedings, 2023b.
- 4k4d: Real-time 4d view synthesis at 4k resolution. 2023c.
- Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023.
- Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33:2492–2502, 2020.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
- Havatar: High-fidelity head avatar via facial model conditioned neural radiance field. ACM Trans. Graph., 2023. Just Accepted.
- Im avatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13545–13555, 2022.
- Pointavatar: Deformable point-based head avatars from videos. In CVPR, pages 21057–21067, 2023.
- Instant volumetric head avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4574–4584, 2023.
- Ewa volume splatting. In Proceedings Visualization, 2001. VIS’01., pages 29–538. IEEE, 2001.