GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation (2311.17971v2)

Published 29 Nov 2023 in cs.CV

Abstract: Text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models has shown great promise but still suffers from inconsistent 3D geometric structures (Janus problems) and severe artifacts. The aforementioned problems mainly stem from 2D diffusion models lacking 3D awareness during the lifting. In this work, we present GeoDream, a novel method that incorporates explicit generalized 3D priors with 2D diffusion priors to enhance the capability of obtaining unambiguous 3D consistent geometric structures without sacrificing diversity or fidelity. Specifically, we first utilize a multi-view diffusion model to generate posed images and then construct cost volume from the predicted image, which serves as native 3D geometric priors, ensuring spatial consistency in 3D space. Subsequently, we further propose to harness 3D geometric priors to unlock the great potential of 3D awareness in 2D diffusion priors via a disentangled design. Notably, disentangling 2D and 3D priors allows us to refine 3D geometric priors further. We justify that the refined 3D geometric priors aid in the 3D-aware capability of 2D diffusion priors, which in turn provides superior guidance for the refinement of 3D geometric priors. Our numerical and visual comparisons demonstrate that GeoDream generates more 3D consistent textured meshes with high-resolution realistic renderings (i.e., 1024 $\times$ 1024) and adheres more closely to semantic coherence.

PDF Abstract

GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

The paper introduces GeoDream, a novel method designed to improve the fidelity and consistency of 3D generation using text-to-image diffusion models. Recent advancements in diffusion models have demonstrated their efficacy in text-to-image synthesis, yet applying these models to 3D generation remains challenging due to issues such as the Janus problem and artifacts in geometric structures. This paper addresses these issues by integrating explicit 3D priors with 2D diffusion models, resulting in improved 3D consistency without sacrificing diversity or fidelity.

The methodology of GeoDream involves two main stages. Firstly, a multi-view diffusion model is used to generate posed images from which cost volumes are constructed, serving as native 3D geometric priors. These priors ensure spatial consistency in 3D space and offer cues for the refinement of 3D structures. Secondly, the 3D priors are used to enhance the 3D awareness of 2D diffusion models through a disentangled design approach. By separating 2D and 3D priors, the system improves its ability to generate semantically coherent and richly detailed 3D objects.

Numerical evaluations and visual comparisons demonstrate the superiority of GeoDream, showcasing its ability to generate high-resolution (1024 × 1024) textured meshes with heightened semantic coherence. The practical implications of this research suggest substantial potential applications in the game and media industries, where consistency and fidelity are critical.

In conclusion, GeoDream capitalizes on the integration of explicitly defined 3D priors with 2D diffusion models to resolve prevalent issues in the domain of text-to-3D generation. By refining the 3D priors and employing a disentangled approach, this methodology offers a robust pathway toward generating highly detailed and consistent 3D assets. It serves as a significant development in the field, potentially guiding future explorations into refining AI models for complex 3D generation tasks. As advancements in these models continue, this approach might pave the way for innovative applications across various sectors, broadening the horizon for AI-driven 3D content creation.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Baorui Ma (18 papers)
Haoge Deng (5 papers)
Junsheng Zhou (28 papers)
Yu-Shen Liu (79 papers)
Tiejun Huang (130 papers)
Xinlong Wang (56 papers)

Citations (19)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos