Text-to-3D using Gaussian Splatting (2309.16585v4)

Published 28 Sep 2023 in cs.CV

Abstract: Automatic text-to-3D generation that combines Score Distillation Sampling (SDS) with the optimization of volume rendering has achieved remarkable progress in synthesizing realistic 3D objects. Yet most existing text-to-3D methods by SDS and volume rendering suffer from inaccurate geometry, e.g., the Janus issue, since it is hard to explicitly integrate 3D priors into implicit 3D representations. Besides, it is usually time-consuming for them to generate elaborate 3D models with rich colors. In response, this paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation. GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under 3D point cloud diffusion prior along with the ordinary 2D SDS optimization, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative appearance refinement to enrich texture details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D assets with delicate details and accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. Our code is available at https://github.com/gsgen3d/gsgen

PDF Abstract

Text-to-3D using Gaussian Splatting

The paper presents a novel approach to text-to-3D generation called Gaussian Splatting based text-to-3D generation (Gsgen). The authors address the limitations of previous methods in generating accurate and high-fidelity 3D objects. These limitations include inaccurate geometry and limited detail due to the absence of explicit 3D priors and appropriate representation.

Methodology

The core innovation of Gsgen is the utilization of 3D Gaussian Splatting, which offers a state-of-the-art representation that incorporates 3D priors effectively. The approach is divided into two distinct stages: geometry optimization and appearance refinement.

Geometry Optimization: This stage involves establishing a coarse representation guided by 3D geometry priors in addition to traditional 2D SDS loss. The strategy ensures a sensible and consistent rough shape and includes joint guidance from 2D and 3D diffusion models, incorporating point cloud diffusion priors from models such as Point-E.
Appearance Refinement: The refinement stage focuses on enriching details through an iterative optimization process, enhancing fidelity without compromising the established geometry. This is achieved through a compactness-based densification technique, which supplements traditional densification methods reliant on view-space gradients. This innovation allows for continuous and faithful appearance representation, particularly for high-frequency components.

Results

The authors provide extensive empirical evaluations demonstrating the effectiveness of Gsgen. Specifically, the approach shows improvements in generating assets with intricate details such as feathers, textured surfaces, and animal fur, which have been challenging for previous methods. The proposed two-stage optimization and the use of 3D Gaussian Splatting are pivotal in overcoming the Janus problem, where previous methods struggled with collapsed and inconsistent geometry.

Implications and Future Directions

The implications of this research are significant for fields requiring high-quality 3D representations from textual inputs, such as virtual reality, gaming, and digital content creation. The introduction of explicit 3D priors and Gaussian Splatting has the potential to redefine how complex 3D scenes are generated from text.

Future work could explore further integration of more advanced LLMs to enhance the semantic understanding and generalizability of text-to-3D generation. Additionally, refining the densification process and exploring alternative representations could further improve fidelity and efficiency.

Conclusion

The paper offers a substantial contribution to the text-to-3D generation domain, addressing key limitations of fidelity and geometric accuracy. Gsgen’s methodological framework provides a robust foundation for future advancements in generating high-quality 3D content from textual descriptions.