DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior (2310.16818v2)

Published 25 Oct 2023 in cs.CV and cs.CG

Abstract: We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation. Code available at https://github.com/deepseek-ai/DreamCraft3D.

PDF Abstract

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

The paper introduces DreamCraft3D, a novel hierarchical method for generating high-fidelity and coherent 3D objects. Addressing the challenges in 3D content creation, DreamCraft3D emphasizes maintaining consistency while producing intricate details. The approach leverages a 2D reference image for guiding geometry sculpting and texture enhancement, employing a view-dependent diffusion model to ensure coherence.

Key Components

DreamCraft3D involves several stages, each engineered to enhance 3D object generation:

Geometry Sculpting:
- Uses Score Distillation Sampling (SDS) with diffusion models to prioritize geometry coherence over fidelity.
- Implements a view-conditioned diffusion model, Zero-1-to-3, for improved viewpoint awareness and consistency across renderings.
- Employs strategies like progressive view training and diffusion timestep annealing to refine the 3D geometry gradually.
Texture Boosting:
- Utilizes Bootstrapped Score Distillation (BSD) to significantly enhance texture quality.
- Adopts a personalized 3D-aware diffusion model, DreamBooth, fine-tuned on multi-view renderings to ensure view-consistent guidance.
- Alternates optimization between the diffusion prior and 3D scene representation to mutually reinforce improvements.

Experimental Evaluation

The approach is rigorously compared against various baselines, including DreamFusion, Magic3D, and Make-It-3D. Key numerical results highlight DreamCraft3D’s superior performance in terms of CLIP score and perceptual loss, illustrating its capability to generate more realistic and consistent 3D assets.

Implications and Future Directions

The implications of DreamCraft3D are multifaceted. Practically, it sets a new benchmark in 3D content generation, potentially benefiting industries like gaming, virtual reality, and film. Theoretically, it underscores the efficacy of a hierarchical framework coupled with advanced diffusion models in overcoming limitations of prior 3D generation methods.

Future developments may explore enhancements in material and lighting separation, addressing depth ambiguity challenges, and expanding the diversity of generated 3D models. Additionally, refining the diffusion models for broader applications in other domains could be a promising direction.

DreamCraft3D represents a substantial stride in leveraging 2D models for intricate 3D creation, offering a robust framework for further advancements in artificial intelligence-driven content generation.