PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting (2401.12900v5)
Abstract: Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ($\ge$ 25 fps at a resolution of 512 $\times$ 512 ).
- A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, page 187–194, 1999.
- Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, 2013.
- Monogaussianavatar: Monocular gaussian point-based head avatar. In arXiv preprint arXiv:2312.04558, 2023.
- Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2019.
- Towards high fidelity monocular face reconstruction with rich reflectance using self-supervised learning and ray tracing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12819–12829, 2021.
- Bakedavatar: Baking neural fields for real-time head avatar synthesis. ACM Transactions on Graphics (TOG), 42(6):1–17, 2023.
- 3d morphable face models - past, present and future. In ACM Transactions on Graphics, pages 1–38, 2020.
- Learning an animatable detailed 3D face model from in-the-wild images. 2021.
- K-planes:explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12479–12488, 2023.
- Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021.
- Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, page 5712–5721, 2021.
- Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (TOG), 41(6):1–12, 2022.
- Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 1155–1164, 2019.
- Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition, pages 75–82, 2018.
- Neural head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 18653–18664, 2022.
- Neural lumigraph rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 4287–4297, 2021.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 42(4):1–14, 2023.
- Realistic one-shot mesh-based head avatars. In European Conference on Computer Vision, pages 345–362, 2022.
- Hugs: Human gaussian splats. In arXiv preprint arXiv:2311.17910, 2023.
- Gart: Gaussian articulated template models. In arXiv preprint arXiv:2311.16099, 2023.
- Learning a model of facial shape and expression from 4d scans. ACM Transactions on Graphics (TOG), 36(6):1–17, 2017.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Sparse zonal harmonic factorization for efficient sh rotation. ACM Transactions on Graphics (TOG), 31(3):1–9, 2012.
- Face reconstruction from skull shapes and physical attributes. In Proceedings of the Deutsche Arbeitsgemeinschaft für Mustererkennung Symposum, page 232–241, 2009.
- Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians. In arXiv preprint arXiv:2312.02069, 2023.
- H3d-net: Few-shot high-fidelity 3d head reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 5620–5629, 2021.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, page 586–595, 2018.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, pages 234–241, 2015.
- Very deep convolutional networks for large-scale image recognition. In arXiv preprint arXiv:1409.1556, 2014.
- A-nerf: Surface-free human 3d pose refinement via neural rendering. In Advances in Neural Information Processing Systems, 2021.
- Self-supervised multi-level face model learning for monocular reconstruction at over 250 hz. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 2549–2559, 2018.
- One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 10039–10049, 2021a.
- Prior-guided multi-view 3d head reconstruction. IEEE Transactions on Multimedia, 24:4028 – 4040, 2021b.
- Learning compositional radiance fields of dynamic human heads. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5704–5713, 2021c.
- Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5438–5448, 2022.
- Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels. In ACM SIGGRAPH 2023 Conference Proceedings, page 1–10, 2023.
- Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33:2492–2502, 2020.
- Animatable 3d gaussians for high-fidelity synthesis of human motions. In arXiv preprint arXiv:2311.13404, 2023.
- Havatar: High-fidelity head avatar via facial model conditioned neural radiance field. ACM Transactions on Graphics (TOG), 43(1):1–16, 2023.
- Imavatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 13545–13555, 2022.
- Pointavatar: Deformable point-based head avatars from videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21057–21067, 2023.
- Instant volumetric head avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 4574–4584, 2023.
- Zhongyuan Zhao (29 papers)
- Zhenyu Bao (8 papers)
- Qing Li (430 papers)
- Guoping Qiu (61 papers)
- Kanglin Liu (16 papers)