Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DreamBooth3D: Subject-Driven Text-to-3D Generation (2303.13508v2)

Published 23 Mar 2023 in cs.CV, cs.AI, and cs.GR

Abstract: We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject. We overcome this through a 3-stage optimization strategy where we jointly leverage the 3D consistency of neural radiance fields together with the personalization capability of text-to-image models. Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.

DreamBooth3D: Subject-Driven Text-to-3D Generation

This paper introduces DreamBooth3D, an innovative method for subject-driven text-to-3D generation, demonstrating significant advancements in generating 3D assets from limited visual data. The proposed approach is capable of producing detailed, subject-specific 3D models by utilizing as few as 3-6 images, combined with input text prompts to dictate context or modifications.

The core innovation in DreamBooth3D lies in its three-stage optimization process, which synergistically combines the personalization capabilities of DreamBooth with the 3D generation capacity of DreamFusion. The authors note that a straightforward combination of DreamBooth's text-to-image personalization and DreamFusion’s text-to-3D conversion results in failures, mainly due to overfitting to limited input images. To address these shortcomings, a meticulous multi-stage optimization strategy is employed.

Stage one involves partial finetuning of a DreamBooth model, allowing the model to maintain diverse viewpoint compatibility while capturing essential subject characteristics. Subsequent NeRF optimization in this stage results in a preliminary 3D asset lacking in fine subject detail but maintaining coherent geometry.

The second stage involves generating pseudo multi-view images from the initial NeRF renderings by employing Img2Img translation using a fully-trained DreamBooth model. This step effectively enriches the available viewpoint data with approximations that include subject-specific details, which fuels further model refinement.

Finally, the third stage finetunes the DreamBooth model further, incorporating the pseudo multi-view images to finalize the NeRF model. The use of a multi-view DreamBooth reduces the chances of viewpoint overfitting and greatly enhances the fidelity of the resultant 3D assets to the subject’s identity.

Experimental validations on a dataset of 30 subjects validate DreamBooth3D’s ability to generate highly realistic and contextually accurate 3D assets. The approach demonstrated superior performance both quantitatively—via substantial improvements in CLIP R-Precision metrics—and qualitatively against alternate methods such as Latent-NeRF and naïve DreamBooth+Fusion combinations.

Moreover, DreamBooth3D opens pathways to practical applications in 3D asset management, including color customization, accessorization, and pose modifications, which can significantly streamline the workflow in industries like gaming and virtual reality, where personalized and dynamic 3D content is paramount. Despite its strengths, the method exhibits potential limitations in terms of generating thin structures and handling subjects with insufficient view variations.

The theoretical implications of DreamBooth3D are equally promising, suggesting a viable framework for expanding text-to-3D methodologies to efficiently handle sparse input datasets. Future work could explore higher resolution data inputs and refinement techniques to mitigate the current limitations, thereby enhancing the realism and geometric fidelity of the generated 3D assets.

In summary, DreamBooth3D presents a compelling advancement in personalized 3D asset generation through effective integration of text-to-image and text-to-3D technologies, promising significant implications for the future landscape of AI-driven graphics and visualization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Amit Raj (24 papers)
  2. Srinivas Kaza (2 papers)
  3. Ben Poole (46 papers)
  4. Michael Niemeyer (29 papers)
  5. Nataniel Ruiz (32 papers)
  6. Ben Mildenhall (41 papers)
  7. Shiran Zada (9 papers)
  8. Kfir Aberman (46 papers)
  9. Michael Rubinstein (38 papers)
  10. Jonathan Barron (1 paper)
  11. Yuanzhen Li (34 papers)
  12. Varun Jampani (125 papers)
Citations (184)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets