HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting (2402.06149v2)

Published 9 Feb 2024 in cs.CV

Abstract: Creating digital avatars from textual prompts has long been a desirable yet challenging task. Despite the promising results achieved with 2D diffusion priors, current methods struggle to create high-quality and consistent animated avatars efficiently. Previous animatable head models like FLAME have difficulty in accurately representing detailed texture and geometry. Additionally, high-quality 3D static representations face challenges in semantically driving with dynamic priors. In this paper, we introduce \textbf{HeadStudio}, a novel framework that utilizes 3D Gaussian splatting to generate realistic and animatable avatars from text prompts. Firstly, we associate 3D Gaussians with animatable head prior model, facilitating semantic animation on high-quality 3D representations. To ensure consistent animation, we further enhance the optimization from initialization, distillation, and regularization to jointly learn the shape, texture, and animation. Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting appealing appearances. The avatars are capable of rendering high-quality real-time ($\geq 40$ fps) novel views at a resolution of 1024. Moreover, These avatars can be smoothly driven by real-world speech and video. We hope that HeadStudio can enhance digital avatar creation and gain popularity in the community. Code is at: https://github.com/ZhenglinZhou/HeadStudio.

Citations (22)

View on Semantic Scholar

Summary

The paper introduces HeadStudio, a novel framework that combines 3D Gaussian Splatting with FLAME-based methods to generate animatable head avatars from text prompts.
The paper employs FLAME-based 3D Gaussian Splatting and score distillation sampling for accurate facial deformations and semantic fidelity in real-time animations.
The paper demonstrates that the framework can generate high-fidelity avatars exceeding 40 FPS at 1024 resolution, marking a significant advancement in digital avatar technology.

Leveraging 3D Gaussian Splatting for Realistic and Animatable Head Avatars from Text Prompts

Introduction to HeadStudio

In the field of digital head avatars, generating high-quality and animated representations directly from text prompts is a formidable challenge. Recent advancements have pivoted towards text-based generation methods, showing promise over traditional image-based approaches due to their convenience and generalization capabilities. However, a recurring issue has been the trade-off between static quality and dynamism in animation. In response, we introduce HeadStudio, a cutting-edge framework designed to produce realistic and animatable avatars using 3D Gaussian Splatting (3DGS) and leveraging the FLAME statistical head model for semantic deformation and score distillation guidance.

Technical Foundation and Innovations of HeadStudio

HeadStudio stands at the intersection of 3D Gaussian Splatting and FLAME-based methodologies. The approach consists of two pivotal components:

FLAME-based 3D Gaussian Splatting (F-3DGS): This technique rigs 3D Gaussian points to a FLAME mesh, ensuring that deformations adhere to facial expressions accurately. It capitalizes on FLAME's robust morphological control to drive the adaptation of 3D Gaussian points, factoring in facial movements and expressions seamlessly.
FLAME-based Score Distillation Sampling (F-SDS): Leveraging a fine-grained FLAME-based control signal derived from the MediaPipe facial landmark map, F-SDS guides the distillation process. This ensures a high degree of semantic fidelity, enabling the generated avatars to perform realistic animations, driven by real-world speech and video inputs.

A detailed evaluation demonstrates HeadStudio's capability to generate animatable avatars exceeding 40 frames per second at 1024 resolution, marking a significant advancement in both performance and quality.

Practical Implications and Future Directions

HeadStudio not only broadens the scope of digital avatar creation but also introduces a novel methodological approach that could be extended to other domains. The integration of FLAME into both 3D representation and score distillation reflects a nuanced understanding of the underlying statistical model, paving the way for more sophisticated avatar manipulation and control. Furthermore, the ability to generate avatars that can be dynamically controlled in real-time opens new possibilities for applications in virtual and augmented reality, gaming, and online communication platforms.

The success of HeadStudio predicates further exploration into the amalgamation of 3D representation techniques and statistical models for even more detailed and expressive avatars. Future work may delve into enhancing the diversity of avatars, exploring other control signals for animation, and refining the balance between static fidelity and dynamic expression.

Conclusion

HeadStudio represents a notable stride towards resolving the long-standing challenge of generating high-fidelity, animatable head avatars from text prompts. By harnessing the power of 3D Gaussian Splatting and the FLAME model, it establishes a new benchmark for realism and animation capability in digital avatars. As the field of generative AI continues to evolve, approaches like HeadStudio underscore the potential for innovative cross-disciplinary applications, heralding a new era of digital representation.

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1756881979239198958