Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

102 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

2 1 365

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D (2311.16918v2)

Published 28 Nov 2023 in cs.CV and cs.AI

Abstract: Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to optimize surface normals is suboptimal due to the distribution discrepancy between natural images and normals maps, leading to instability in optimization. In this paper, recognizing that the normal and depth information effectively describe scene geometry and be automatically estimated from images, we propose to learn a generalizable Normal-Depth diffusion model for 3D generation. We achieve this by training on the large-scale LAION dataset together with the generalizable image-to-depth and normal prior models. In an attempt to alleviate the mixed illumination effects in the generated materials, we introduce an albedo diffusion model to impose data-driven constraints on the albedo component. Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the detail richness, achieving state-of-the-art results. Our project page is https://aigc3d.github.io/richdreamer/.

PDF HTML Abstract

Introduction to 3D Generation from Text

The field of AI-powered image generation has experienced significant growth, especially with advancements in generative models and powerful training datasets. However, transforming text descriptions into 3D models remains a challenge. Recent developments have made progress, particularly through text-to-3D systems, demonstrating impressive zero-shot generation by optimizing neural radiance fields. Despite this, challenges persist, particularly in creating detailed, rich 3D models that are both geometrical and material accurate.

Overcoming the Challenges of 3D Generation

Traditional methods have approached the challenge of generating 3D content by generating the geometry first and then the texture. However, directly using 2D diffusion models, which are impressive at generating images, are less effective for generating 3D geometries and textures due to distribution differences between natural images and normal maps. To address this, the paper proposes a Normal-Depth diffusion model for 3D generation, which demonstrates significant improvements in detail richness.

Details of the Normal-Depth Diffusion Model

The Normal-Depth diffusion model is particularly innovative because it captures the joint distribution of normal maps and depth information, which are both crucial for detailing the shape and structure of a scene. By training on a large dataset of image-caption pairs and fine-tuning on synthetic datasets, the model can maintain generalization while capturing a wide variety of real-world scenes. Coupled with an albedo diffusion model, this approach helps to separate material reflectance from illumination effects, leading to more accurate appearance modeling for generated 3D objects.

Experimental Results and Contributions

When integrated into existing text-to-3D pipelines, the new models significantly enhance the fidelity of generated 3D content. The experimental evaluation against other state-of-the-art methods shows superior results in terms of geometry and texture details. Additional user studies further confirm that the approach yields visually appealing models that align closely with the text prompts. The key contributions of the paper include the development of the Normal-Depth diffusion model and the albedo diffusion model, which bring marked advancements in the text-to-3D domain.

In conclusion, this research represents a substantial step forward in generative 3D modeling from textual descriptions, offering a well-rounded solution to a previously constrained problem area. The approach facilitates the creation of more detailed, accurate 3D models, unlocking new potential applications and improvements for fields like virtual reality, game development, and beyond. Future work, as outlined by the paper, may focus on expanding these techniques to more complex scenarios, such as text-to-scene generation and improved regularization for material properties.

PDF Markdown Bookmark Chat (Pro)

References (98)

Authors (10)

Lingteng Qiu (18 papers)
Guanying Chen (32 papers)
Xiaodong Gu (62 papers)
Qi Zuo (8 papers)
Mutian Xu (12 papers)
Yushuang Wu (16 papers)
Weihao Yuan (34 papers)
Zilong Dong (34 papers)
Liefeng Bo (84 papers)
Xiaoguang Han (118 papers)

Citations (78)

View on Semantic Scholar

GitHub

Tweets

https://twitter.com/2345198082/status/1740028926775189893

https://twitter.com/919860212/status/1740023483428728993

https://twitter.com/WilliamLamkin/status/1748105356914377200

https://twitter.com/176540776/status/1740028282551304624

YouTube

Show All Videos