ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation (2305.16213v2)

Published 25 May 2023 in cs.LG and cs.CV

Abstract: Score distillation sampling (SDS) has shown great promise in text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models, but suffers from over-saturation, over-smoothing, and low-diversity problems. In this work, we propose to model the 3D parameter as a random variable instead of a constant as in SDS and present variational score distillation (VSD), a principled particle-based variational framework to explain and address the aforementioned issues in text-to-3D generation. We show that SDS is a special case of VSD and leads to poor samples with both small and large CFG weights. In comparison, VSD works well with various CFG weights as ancestral sampling from diffusion models and simultaneously improves the diversity and sample quality with a common CFG weight (i.e., $7.5$). We further present various improvements in the design space for text-to-3D such as distillation time schedule and density initialization, which are orthogonal to the distillation algorithm yet not well explored. Our overall approach, dubbed ProlificDreamer, can generate high rendering resolution (i.e., $512\times512$) and high-fidelity NeRF with rich structure and complex effects (e.g., smoke and drops). Further, initialized from NeRF, meshes fine-tuned by VSD are meticulously detailed and photo-realistic. Project page and codes: https://ml.cs.tsinghua.edu.cn/prolificdreamer/

PDF Abstract

Overview of ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

The paper introduces ProlificDreamer, a novel approach to text-to-3D generation, addressing significant shortcomings observed in the Score Distillation Sampling (SDS) method. It presents the Variational Score Distillation (VSD), a framework that models 3D parameters as random variables, enhancing diversity and sample quality.

Key Contributions

Variational Score Distillation (VSD): VSD reinterprets the 3D parameter as a distribution, unlike SDS where it is treated as a constant. This allows VSD to sample from a more realistic distribution of 3D scenes, improving fidelity and overcoming issues like over-saturation and over-smoothing. SDS is shown to be a limited case of VSD, constrained by its single-point optimization.
Improved Sample Quality and Diversity: Through particle-based variational inference, VSD enhances both the quality and diversity of text-to-3D generation. It effectively optimizes with a common Classifier-Free Guidance (CFG) weight, bypassing the need for large CFG values required by SDS.
Design Space Exploration: The authors propose high rendering resolution, an annealed time schedule, and innovative scene initialization techniques, contributing to the generation of high-fidelity Neural Radiance Fields (NeRF) and detailed textured meshes.

Numerical Results

The paper provides empirical evidence showing that VSD significantly outperforms SDS in both 2D and 3D experiments. The results are demonstrated in terms of improved sample realism and alignment with text prompts using standard CFG values.

Implications and Future Directions

The implications of VSD are substantial for the fields of virtual and augmented reality, animation, and gaming, as it allows for the automated production of high-quality and complex 3D content from textual descriptions. The approach holds potential for further innovation in 3D representation and rendering techniques.

Future work might explore the scaling of VSD to accommodate larger particle sets, which could further enhance the diversity and realism of generated 3D models. The integration with more advanced 3D representations and exploration of adaptive camera positioning strategies could also be promising directions.

Overall, ProlificDreamer offers a comprehensive framework for advancing text-to-3D conversion, addressing key challenges and pushing the boundaries of existing methodologies in the discipline.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Zhengyi Wang (24 papers)
Cheng Lu (70 papers)
Yikai Wang (78 papers)
Fan Bao (30 papers)
Chongxuan Li (75 papers)
Hang Su (224 papers)
Jun Zhu (424 papers)

Citations (634)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos