Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation (2303.15413v5)

Published 27 Mar 2023 in cs.CV, cs.CL, cs.GR, and cs.LG

Abstract: Existing score-distilling text-to-3D generation techniques, despite their considerable promise, often encounter the view inconsistency problem. One of the most notable issues is the Janus problem, where the most canonical view of an object (\textit{e.g}., face or head) appears in other views. In this work, we explore existing frameworks for score-distilling text-to-3D generation and identify the main causes of the view inconsistency problem -- the embedded bias of 2D diffusion models. Based on these findings, we propose two approaches to debias the score-distillation frameworks for view-consistent text-to-3D generation. Our first approach, called score debiasing, involves cutting off the score estimated by 2D diffusion models and gradually increasing the truncation value throughout the optimization process. Our second approach, called prompt debiasing, identifies conflicting words between user prompts and view prompts using a LLM, and adjusts the discrepancy between view prompts and the viewing direction of an object. Our experimental results show that our methods improve the realism of the generated 3D objects by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead. Our project page is available at~\url{https://susunghong.github.io/Debiased-Score-Distillation-Sampling/}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  2. Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
  3. From data to functa: Your data point is a function and you should treat it like one. ICML, 2022.
  4. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  5. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  6. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  7. Improving sample quality of diffusion models using self-attention guidance. arXiv preprint arXiv:2210.00939, 2022.
  8. Zero-shot text-guided object generation with dream fields. In CVPR, pages 867–876, 2022.
  9. Elucidating the design space of diffusion-based generative models. NeurIPS, 2022.
  10. Magic3d: High-resolution text-to-3d content creation. arXiv preprint arXiv:2211.10440, 2022.
  11. Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023.
  12. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
  13. Latent-nerf for shape-guided generation of 3d shapes and textures. arXiv preprint arXiv:2211.07600, 2022.
  14. Tomas Mikolov. Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
  15. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  16. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  17. Improved denoising diffusion probabilistic models. In ICML, pages 8162–8171. PMLR, 2021.
  18. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  19. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  20. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  21. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  22. Let 2d diffusion model know 3d-consistency for robust text-to-3d generation. arXiv preprint arXiv:2303.07937, 2023.
  23. Denoising diffusion implicit models. In ICLR, 2021.
  24. Generative modeling by estimating gradients of the data distribution. NeurIPS, 32, 2019.
  25. Score-based generative modeling through stochastic differential equations. In ICLR, 2020.
  26. Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion. https://github.com/ashawkey/stable-dreamfusion, 2022.
  27. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. arXiv preprint arXiv:2212.00774, 2022.
  28. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
  29. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Susung Hong (12 papers)
  2. Donghoon Ahn (7 papers)
  3. Seungryong Kim (103 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.