HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation (2312.07539v2)

Published 12 Dec 2023 in cs.CV

Abstract: This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction. Two different classifier-free guidance (CFG) weights are applied during these two predictions, and the prediction difference offers a direction on how the rendered image can better match the text of interest. Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photorealistic appearance, significantly outperforming state-ofthe-art methods. We also show that the same pipeline well supports editing the generated heads, including both geometry deformation and appearance change.

Citations (12)

View on Semantic Scholar

Summary

The paper demonstrates a novel self score distillation technique that iteratively refines 3D head models from text for improved photo-realism.
It employs a two-step process separating geometry and texture, leveraging deformable grid meshes and neural feedback for enhanced fidelity.
The method outperforms existing approaches, enabling dynamic editing of facial expressions and aging in AR/VR and gaming applications.

Introduction

Text-to-3D generation technology allows users to create three-dimensional models from textual descriptions. This area of research is particularly relevant for applications in augmented reality, virtual reality, and gaming. Converting text descriptions into 3D objects is challenging, particularly when generating human heads due to the complexity of facial features and the demand for photo-realistic output.

Self Score Distillation

The innovative technique introduced in this paper, known as "self score distillation" (SSD), significantly enhances the quality of text-to-3D head generation. The SSD method feeds a parameterized 3D head model through a generative neural network, using the network's own predictions to refine the model iteratively. By rendering an image and landmarks of the head model, introducing noise, and then feeding the noisy image back into the network, the system generates two different predictions. This dual-prediction approach gives a direction for refining the 3D model to better match the text description.

3D Head Generation and Editing

The process is split into two major steps: generating the geometry and texture of the head. The geometry is modeled using a deformable grid mesh initialized with a pre-existing 3D head model. Once the geometry is set, the SSD optimizes the head parameters under the guidance of camera poses and landmarks to enhance the fidelity of geometry and appearance of the 3D model.

Furthermore, the same pipeline supports editing the 3D head model. Text descriptions can act as instructions to manipulate the geometry and texture of the head, allowing for adjustments like changing facial expressions or aging, thereby increasing the applicability of the technology.

Evaluation and Comparison

Evaluation of the methodology demonstrates its capability to outperform existing state-of-the-art methods in the generation of highly detailed, geometrically accurate, and photo-realistic 3D head sculptures. The technology presented is not restricted by dataset diversity limitations and is robust enough to edit heads in various ways while maintaining the character's identity.

Conclusion

In summary, the paper presents a novel approach to generating and editing 3D heads from text, surpassing current methods in realism and detailing. The authors' technique, built around the concept of self score distillation, serves as a significant advancement in the field, offering potential improvements for real-time interaction and character creation within augmented and virtual environments.

Related Papers

GitHub

HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation