Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting (2401.12900v5)

Published 23 Jan 2024 in cs.GR and cs.CV

Abstract: Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ($\ge$ 25 fps at a resolution of 512 $\times$ 512 ).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, page 187–194, 1999.
  2. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, 2013.
  3. Monogaussianavatar: Monocular gaussian point-based head avatar. In arXiv preprint arXiv:2312.04558, 2023.
  4. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2019.
  5. Towards high fidelity monocular face reconstruction with rich reflectance using self-supervised learning and ray tracing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12819–12829, 2021.
  6. Bakedavatar: Baking neural fields for real-time head avatar synthesis. ACM Transactions on Graphics (TOG), 42(6):1–17, 2023.
  7. 3d morphable face models - past, present and future. In ACM Transactions on Graphics, pages 1–38, 2020.
  8. Learning an animatable detailed 3D face model from in-the-wild images. 2021.
  9. K-planes:explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12479–12488, 2023.
  10. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021.
  11. Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, page 5712–5721, 2021.
  12. Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (TOG), 41(6):1–12, 2022.
  13. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 1155–1164, 2019.
  14. Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition, pages 75–82, 2018.
  15. Neural head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 18653–18664, 2022.
  16. Neural lumigraph rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 4287–4297, 2021.
  17. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 42(4):1–14, 2023.
  18. Realistic one-shot mesh-based head avatars. In European Conference on Computer Vision, pages 345–362, 2022.
  19. Hugs: Human gaussian splats. In arXiv preprint arXiv:2311.17910, 2023.
  20. Gart: Gaussian articulated template models. In arXiv preprint arXiv:2311.16099, 2023.
  21. Learning a model of facial shape and expression from 4d scans. ACM Transactions on Graphics (TOG), 36(6):1–17, 2017.
  22. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  23. Sparse zonal harmonic factorization for efficient sh rotation. ACM Transactions on Graphics (TOG), 31(3):1–9, 2012.
  24. Face reconstruction from skull shapes and physical attributes. In Proceedings of the Deutsche Arbeitsgemeinschaft für Mustererkennung Symposum, page 232–241, 2009.
  25. Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians. In arXiv preprint arXiv:2312.02069, 2023.
  26. H3d-net: Few-shot high-fidelity 3d head reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 5620–5629, 2021.
  27. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, page 586–595, 2018.
  28. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, pages 234–241, 2015.
  29. Very deep convolutional networks for large-scale image recognition. In arXiv preprint arXiv:1409.1556, 2014.
  30. A-nerf: Surface-free human 3d pose refinement via neural rendering. In Advances in Neural Information Processing Systems, 2021.
  31. Self-supervised multi-level face model learning for monocular reconstruction at over 250 hz. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 2549–2559, 2018.
  32. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 10039–10049, 2021a.
  33. Prior-guided multi-view 3d head reconstruction. IEEE Transactions on Multimedia, 24:4028 – 4040, 2021b.
  34. Learning compositional radiance fields of dynamic human heads. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5704–5713, 2021c.
  35. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5438–5448, 2022.
  36. Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels. In ACM SIGGRAPH 2023 Conference Proceedings, page 1–10, 2023.
  37. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33:2492–2502, 2020.
  38. Animatable 3d gaussians for high-fidelity synthesis of human motions. In arXiv preprint arXiv:2311.13404, 2023.
  39. Havatar: High-fidelity head avatar via facial model conditioned neural radiance field. ACM Transactions on Graphics (TOG), 43(1):1–16, 2023.
  40. Imavatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 13545–13555, 2022.
  41. Pointavatar: Deformable point-based head avatars from videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21057–21067, 2023.
  42. Instant volumetric head avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page 4574–4584, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhongyuan Zhao (29 papers)
  2. Zhenyu Bao (8 papers)
  3. Qing Li (430 papers)
  4. Guoping Qiu (61 papers)
  5. Kanglin Liu (16 papers)
Citations (7)

Summary

  • The paper introduces a novel point-based morphable shape model combined with 3D Gaussian splatting to accurately capture complex details like hair and eyeglasses.
  • It achieves real-time, high-quality rendering at 25 fps, outperforming current methods in both photorealism and geometric consistency.
  • The framework shows strong potential for applications in gaming, VR, and film, and paves the way for further research in dynamic avatar animation.

An Overview of PSAvatar: A Point-based Morphable Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting

The paper in question presents an innovative approach titled PSAvatar, designed to address the challenges of creating high-fidelity, real-time animatable head avatars from monocular portrait videos. The authors have successfully tackled the limitations found in conventional 3D Morphable Models (3DMMs) and neural implicit representations by integrating a Point-based Morphable Shape Model (PMSM) with 3D Gaussian splatting. The combination of these techniques allows for efficient rendering while maintaining flexibility in representing intricate details such as hairstyles and accessories like eyeglasses.

Technical Contributions

The paper introduces several key technical elements:

  • Morphable Shape Model: The development of the Point-based Morphable Shape Model (PMSM) serves as a robust alternative to mesh-based techniques. By converting the FLAME mesh to points, PSAvatar achieves greater representation flexibility, enabling the modeling of complex structures like hair strands and eyeglasses, which traditional 3DMMs often fail to accurately capture.
  • 3D Representation Using Gaussians: The integration of 3D Gaussian splatting with the PMSM is a pivotal innovation. This approach harnesses the flexibility and scale invariance of 3D Gaussians, which enhances the capability for fine detail representation, particularly crucial for modeling volumetric structures. The Gaussian splatting technique ensures efficient rendering, addressing the computational challenges faced by neural implicit methods.
  • Real-Time High-Fidelity Rendering: PSAvatar is capable of reconstructing detailed and photorealistic head avatars at a rate of 25 fps at a resolution of 512x512, utilizing an Nvidia RTX 3090. This is achieved through a combination of the aforementioned models and a U-net based enhancement network, which further refines the output quality.

Performance Analysis

The authors provide compelling numerical results demonstrating PSAvatar's superiority over existing state-of-the-art methods like INSTA, IMAvatar, and PointAvatar. Quantitative metrics such as PSNR, SSIM, and LPIPS indicate that PSAvatar achieves higher fidelity in both geometric consistency and visual realism. The results are particularly notable in scenarios involving complex head dynamics and fine details, lending credibility to its utility in practical applications.

Implications and Future Directions

The development of PSAvatar holds significant practical implications for industries such as gaming, virtual reality, and film, which demand both real-time rendering capabilities and high-level detail for character avatars. Theoretically, the paper suggests a promising direction for future research in geometric representation, combining explicit modeling techniques with 3D Gaussian fields to enhance fidelity and efficiency.

Future developments may focus on further optimization of the computational demands associated with Gaussian splatting and the extension of these techniques to full-body avatars or dynamic environments. Additionally, exploring the integration of PSAvatar with machine learning frameworks for automated enhancement and adaptability could further broaden its applicability.

Conclusion

The PSAvatar framework marks a substantial stride forward in the domain of real-time animatable head avatar creation. By leveraging point-based morphable shapes and 3D Gaussian splatting, it circumvents the limitations of previous models, offering a new paradigm for high-fidelity and computation-efficient avatar generation. The paper provides a foundation for ongoing innovation in creating immersive and interactive virtual experiences.