DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion (2403.17237v1)

Published 25 Mar 2024 in cs.CV, cs.AI, and cs.GR

Abstract: We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions. While recent progress on text-to-3D generation methods have been promising, prevailing methods often fail to ensure view-consistency and textural richness. This problem becomes particularly noticeable for methods that work with text input alone. To address this, we propose a two-stage Gaussian Splatting based approach that enforces geometric consistency among views. Initially, a coarse 3D generation undergoes refinement via geometric optimization. Subsequently, we use a ControlNet driven refiner coupled with the geometric consistency term to improve both texture fidelity and overall consistency of the generated 3D asset. Empirical evaluations across diverse textual prompts spanning various object categories demonstrate the efficacy of DreamPolisher in generating consistent and realistic 3D objects, aligning closely with the semantics of the textual instructions.

PDF HTML Abstract

DreamPolisher: Enhancing Text-to-3D Generation with Geometric Diffusion

Introduction

The field of generative models has seen remarkable advancements, particularly with the proliferation of models that convert textual descriptions into visual content. Among these, text-to-3D generation stands as a promising frontier, offering vast potentials for applications in virtual and augmented reality, game development, and beyond. However, current models often struggle with producing view-consistent and texturally rich 3D objects when working exclusively from text inputs. DreamPolisher is introduced as a novel approach to address these limitations, leveraging Gaussian Splatting with geometric guidance to refine initial coarse 3D generations into high-quality, view-consistent assets.

Text-to-3D Generation Challenges

Predominant methods in text-to-3D generation fall into two main categories: direct text-to-3D and text-to-image-to-3D approaches. Both approaches have demonstrated capabilities to varying degrees, yet they suffer from significant drawbacks. Direct text-to-3D methods, while efficient, often lack the texture detail and are prone to inconsistencies across different views. Conversely, text-to-image-to-3D techniques, despite their ability to produce more detailed outputs, are hampered by lengthy training times and computational demands. DreamPolisher endeavors to fill this gap, aiming for a balance between efficiency and output quality by introducing a two-stage Gaussian Splatting approach enriched with geometric optimization and a novel refiner.

DreamPolisher Overview

DreamPolisher operates in two primary stages: a coarse optimization phase followed by an appearance refinement phase:

Stage 1 (Coarse Optimization) involves generating a preliminary view-consistent 3D object from textual descriptions using a pre-trained text-to-point diffusion model. This stage sets the foundation for geometric consistency across views.
Stage 2 (Appearance Refinement) focuses on enhancing texture fidelity and geometric consistency. A ControlNet-driven refiner is introduced, working in tandem with a geometric consistency loss function, to elevate the visual quality of the 3D assets significantly.

The methodology places a strong emphasis on maintaining geometric consistency while refining textures and details, distinguishing DreamPolisher from existing text-to-3D generation approaches.

Experimental Evaluation

DreamPolisher's performance was assessed through various experiments, demonstrating superior capabilities in generating realistic and consistent 3D objects across a wide range of object categories. The evaluations also highlighted the method’s efficiency, showing notable improvements over existing text-to-3D and text-to-image-to-3D approaches both in terms of consistency and detail, within a reasonable computation time frame.

Conclusions and Future Directions

DreamPolisher represents a significant step forward in the text-to-3D generation domain, showcasing the effectiveness of geometric diffusion in producing high-quality, view-consistent 3D objects from textual descriptions. The method not only pushes the boundaries of current capabilities but also opens up new avenues for research and application. Future developments might explore further enhancements in efficiency and generative quality, potentially incorporating more advanced diffusion models or refining techniques to expand the range and complexity of generable 3D objects.

Challenges and Limitations

While DreamPolisher marks a leap towards more realistic text-to-3D generation, it is not without its challenges. The reliance on textual descriptions alone, without accommodating image-based inputs for guidance, can sometimes result in inaccuracies or inconsistencies, particularly with objects that possess intricate details or require precise geometric replication. Addressing these challenges through model improvements or incorporating multimodal inputs represents a potential area for further research.

In summary, DreamPolisher introduces a compelling approach to text-to-3D generation, combining Gaussian Splatting with geometric guidance to achieve new standards in quality and efficiency. As generative AI continues to evolve, DreamPolisher's contributions offer valuable insights and lay groundwork for future advancements in the field.