MicroDreamer: Efficient 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction (2404.19525v3)

Published 30 Apr 2024 in cs.CV

Abstract: Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample and the limitation of optimization confined to latent space. This paper introduces score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs and enable optimization in pixel space. Given a single set of images sampled from a multi-view score-based diffusion model, SIR repeatedly optimizes 3D parameters, unlike the single-step optimization in SDS. With other improvements in training, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field while retaining a comparable performance and takes about 20 seconds to create meshes from 3D Gaussian splatting on a single A100 GPU, halving the time of the fastest optimization-based baseline DreamGaussian with significantly superior performance compared to the measurement standard deviation. Our code is available at https://github.com/ML-GSAI/MicroDreamer.

References (1)

stability.ai: Stable zero123 (2023), https://stability.ai/news/stable-zero123-3d-generation

Authors (7)

Luxi Chen (3 papers)
Zhengyi Wang (24 papers)
Chongxuan Li (75 papers)
Tingting Gao (25 papers)
Hang Su (224 papers)
Jun Zhu (424 papers)
Zihan Zhou (90 papers)

Summary

Analysis of "MicroDreamer: Zero-shot 3D Generation in $\sim$ 20 Seconds by Score-based Iterative Reconstruction"

The presented paper, "MicroDreamer: Zero-shot 3D Generation in $\sim$ 20 Seconds by Score-based Iterative Reconstruction," introduces a novel algorithm, termed score-based iterative reconstruction (SIR), for generating 3D content in a zero-shot manner. This work exhibits significant strides in addressing the computational inefficiency characterizing existing optimization-based approaches, notably score distillation sampling (SDS), for 3D generation.

The innovation pivots on reducing the substantial number of function evaluations (NFEs) needed in previous methods by leveraging a multi-view diffusion model. The SIR algorithm iteratively refines 3D parameters, thereby mimicking the classical process of 3D reconstruction. This approach enables repeated optimization of 3D parameters within each iteration using a set of images generated from the diffusion model, effectively cutting down the NFEs.

MicroDreamer, developed on the foundation of SIR, demonstrates the capability to achieve 3D mesh generation in approximately 20 seconds using a single NVIDIA A100 GPU, offering a substantial speed advantage over the earlier methods like DreamGaussian. It is particularly noteworthy that MicroDreamer is 5-20 times faster than SDS in generating neural radiance fields (NeRF) while maintaining comparably high performance.

Key Contributions and Results

MicroDreamer's primary contribution lies in building a bridge between diffusion models and iterative reconstruction for efficient 3D generation. By employing a multi-view diffusion approach without requiring additional 3D data, the proposed method effectively manages the challenge of data scarcity in 3D applications. Several aspects distinguish MicroDreamer:

Score-based Iterative Reconstruction (SIR): The SIR algorithm utilizes a multi-view score-based diffusion model, iteratively refining the 3D parameters by minimizing a reconstruction loss. This approach significantly reduces NFEs compared to traditional SDS, enabling faster generation without compromising quality.
Optimization in Pixel Space: By refining images through a diffusion sampling process and mapping them back to pixel space, MicroDreamer facilitates direct optimization of 3D content, bypassing inefficiencies inherent in the latent space optimization typical of LDM-based approaches.
Adaptability Across Tasks and Representations: The framework applies to both NeRF and 3D Gaussian splatting (3DGS) across text-to-3D and image-to-3D tasks, demonstrating versatility in various 3D generation scenarios.

MicroDreamer's results were validated against multiple base diffusion models, including MVDream, Stable Zero123, and ImageDream. The experimentation showcases substantial efficiency gains alongside maintaining competitiveness in quality, particularly when benchmarked against state-of-the-art methods like DreamGaussian.

Theoretical and Practical Implications

Theoretically, the paper enriches the landscape of efficient 3D generation by challenging the conventional reliance on extensive NFEs. It reconceptualizes the role of iterative refinement in zero-shot contexts, offering a route forward for more computationally efficient models that harness the advantages of diffusion models' power in representing complex structures.

Practically, MicroDreamer's contributions indicate promising advancements for applications necessitating rapid 3D generation, such as virtual reality, gaming, and design prototyping. The significant reduction in generation time to about 20 seconds on commercial hardware facilities broadens the accessibility and scalability of high-fidelity 3D content creation.

Future Directions

Future exploration could involve integrating consistency models or exploring stochastic samplers with fewer steps to further optimize efficiency. Also, enhancing the fidelity and consistency of multi-view outputs from base diffusion models could unlock additional improvements in the quality of generated 3D objects.

MicroDreamer stands as a noteworthy advancement in efficient 3D model generation, indicating a material step forward in methods that balance computational cost with output quality. As the ecosystem of multi-view diffusion models matures, frameworks like MicroDreamer can expect to harness these advancements to push the boundaries of what is feasible in zero-shot 3D content creation.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - ML-GSAI/MicroDreamer: Official implementation of "MicroDreamer: Zero-shot 3D Generation in ~20 Seconds by Score-based Iterative Reconstruction". (119 stars)

Tweets

https://twitter.com/_akhaliq/status/1785528148588691557