PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting (2401.12900v5)

Published 23 Jan 2024 in cs.GR and cs.CV

Abstract: Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ($\ge$ 25 fps at a resolution of 512 $\times$ 512 ).

References (42)

Authors (5)

Zhongyuan Zhao (29 papers)
Zhenyu Bao (8 papers)
Qing Li (430 papers)
Guoping Qiu (61 papers)
Kanglin Liu (16 papers)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces a novel point-based morphable shape model combined with 3D Gaussian splatting to accurately capture complex details like hair and eyeglasses.
It achieves real-time, high-quality rendering at 25 fps, outperforming current methods in both photorealism and geometric consistency.
The framework shows strong potential for applications in gaming, VR, and film, and paves the way for further research in dynamic avatar animation.

An Overview of PSAvatar: A Point-based Morphable Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting

The paper in question presents an innovative approach titled PSAvatar, designed to address the challenges of creating high-fidelity, real-time animatable head avatars from monocular portrait videos. The authors have successfully tackled the limitations found in conventional 3D Morphable Models (3DMMs) and neural implicit representations by integrating a Point-based Morphable Shape Model (PMSM) with 3D Gaussian splatting. The combination of these techniques allows for efficient rendering while maintaining flexibility in representing intricate details such as hairstyles and accessories like eyeglasses.

Technical Contributions

The paper introduces several key technical elements:

Morphable Shape Model: The development of the Point-based Morphable Shape Model (PMSM) serves as a robust alternative to mesh-based techniques. By converting the FLAME mesh to points, PSAvatar achieves greater representation flexibility, enabling the modeling of complex structures like hair strands and eyeglasses, which traditional 3DMMs often fail to accurately capture.
3D Representation Using Gaussians: The integration of 3D Gaussian splatting with the PMSM is a pivotal innovation. This approach harnesses the flexibility and scale invariance of 3D Gaussians, which enhances the capability for fine detail representation, particularly crucial for modeling volumetric structures. The Gaussian splatting technique ensures efficient rendering, addressing the computational challenges faced by neural implicit methods.
Real-Time High-Fidelity Rendering: PSAvatar is capable of reconstructing detailed and photorealistic head avatars at a rate of 25 fps at a resolution of 512x512, utilizing an Nvidia RTX 3090. This is achieved through a combination of the aforementioned models and a U-net based enhancement network, which further refines the output quality.

Performance Analysis

The authors provide compelling numerical results demonstrating PSAvatar's superiority over existing state-of-the-art methods like INSTA, IMAvatar, and PointAvatar. Quantitative metrics such as PSNR, SSIM, and LPIPS indicate that PSAvatar achieves higher fidelity in both geometric consistency and visual realism. The results are particularly notable in scenarios involving complex head dynamics and fine details, lending credibility to its utility in practical applications.

Implications and Future Directions

The development of PSAvatar holds significant practical implications for industries such as gaming, virtual reality, and film, which demand both real-time rendering capabilities and high-level detail for character avatars. Theoretically, the paper suggests a promising direction for future research in geometric representation, combining explicit modeling techniques with 3D Gaussian fields to enhance fidelity and efficiency.

Future developments may focus on further optimization of the computational demands associated with Gaussian splatting and the extension of these techniques to full-body avatars or dynamic environments. Additionally, exploring the integration of PSAvatar with machine learning frameworks for automated enhancement and adaptability could further broaden its applicability.

Conclusion

The PSAvatar framework marks a substantial stride forward in the domain of real-time animatable head avatar creation. By leveraging point-based morphable shapes and 3D Gaussian splatting, it circumvents the limitations of previous models, offering a new paradigm for high-fidelity and computation-efficient avatar generation. The paper provides a foundation for ongoing innovation in creating immersive and interactive virtual experiences.

PDF Markdown

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1750024338575618519

https://twitter.com/arxivsanitybot/status/1750340517949272550

https://twitter.com/gm8xx8/status/1749978627360403501