GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation
The paper introduces GeoDream, a novel method designed to improve the fidelity and consistency of 3D generation using text-to-image diffusion models. Recent advancements in diffusion models have demonstrated their efficacy in text-to-image synthesis, yet applying these models to 3D generation remains challenging due to issues such as the Janus problem and artifacts in geometric structures. This paper addresses these issues by integrating explicit 3D priors with 2D diffusion models, resulting in improved 3D consistency without sacrificing diversity or fidelity.
The methodology of GeoDream involves two main stages. Firstly, a multi-view diffusion model is used to generate posed images from which cost volumes are constructed, serving as native 3D geometric priors. These priors ensure spatial consistency in 3D space and offer cues for the refinement of 3D structures. Secondly, the 3D priors are used to enhance the 3D awareness of 2D diffusion models through a disentangled design approach. By separating 2D and 3D priors, the system improves its ability to generate semantically coherent and richly detailed 3D objects.
Numerical evaluations and visual comparisons demonstrate the superiority of GeoDream, showcasing its ability to generate high-resolution (1024 × 1024) textured meshes with heightened semantic coherence. The practical implications of this research suggest substantial potential applications in the game and media industries, where consistency and fidelity are critical.
In conclusion, GeoDream capitalizes on the integration of explicitly defined 3D priors with 2D diffusion models to resolve prevalent issues in the domain of text-to-3D generation. By refining the 3D priors and employing a disentangled approach, this methodology offers a robust pathway toward generating highly detailed and consistent 3D assets. It serves as a significant development in the field, potentially guiding future explorations into refining AI models for complex 3D generation tasks. As advancements in these models continue, this approach might pave the way for innovative applications across various sectors, broadening the horizon for AI-driven 3D content creation.