Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation (2306.12422v2)

Published 21 Jun 2023 in cs.CV, cs.GR, and cs.LG

Abstract: Text-to-image diffusion models pre-trained on billions of image-text pairs have recently enabled 3D content creation by optimizing a randomly initialized differentiable 3D representation with score distillation. However, the optimization process suffers slow convergence and the resultant 3D models often exhibit two limitations: (a) quality concerns such as missing attributes and distorted shape and texture; (b) extremely low diversity comparing to text-guided image synthesis. In this paper, we show that the conflict between the 3D optimization process and uniform timestep sampling in score distillation is the main reason for these limitations. To resolve this conflict, we propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns the 3D optimization process with the sampling process of diffusion model. Extensive experiments show that our simple redesign significantly improves 3D content creation with faster convergence, better quality and diversity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. arXiv preprint arXiv:2211.01324, 2022.
  2. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  3. Efficient Geometry-Aware 3D Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
  4. Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5799–5809, 2021.
  5. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3558–3568, 2021.
  6. Perception Prioritized Training of Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11472–11481, 2022.
  7. Diffusion Models Beat GANs on Image Synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  8. David L Donoho et al. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. AMS Math Challenges Lecture, 1(2000):32, 2000.
  9. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  10. Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  11. Zero-Shot Text-Guided Object Generation With Dream Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
  12. Training Generative Adversarial Networks with Limited Data. Advances in Neural Information Processing Systems, 33:12104–12114, 2020.
  13. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019.
  14. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, 2015.
  15. Magic3D: High-Resolution Text-to-3D Content Creation. arXiv preprint arXiv:2211.10440, 2022.
  16. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 1290–1294, 2017.
  17. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. arXiv preprint arXiv:2211.07600, 2022.
  18. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Communications of the ACM, 65(1):99–106, 2021.
  19. CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 Conference Papers, pages 1–8, 2022.
  20. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  21. Do Deep Generative Models Know What They Don’t Know? In International Conference on Learning Representations, 2019.
  22. HoloGAN: Unsupervised Learning of 3D Representations From Natural Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7588–7597, 2019.
  23. BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images. Advances in Neural Information Processing Systems, 33:6767–6778, 2020.
  24. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. arXiv preprint arXiv:2112.10741, 2021.
  25. Improved Denoising Diffusion Probabilistic Models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  26. GIRAFFE: Representing Scenes As Compositional Generative Neural Feature Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11453–11464, 2021.
  27. DreamFusion: Text-to-3D using 2D Diffusion. arXiv preprint arXiv:2209.14988, 2022.
  28. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  29. On the Spectral Bias of Neural Networks. In International Conference on Machine Learning, pages 5301–5310. PMLR, 2019.
  30. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125, 2022.
  31. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  32. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:2205.11487, 2022.
  33. LAION-5B: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
  34. GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. Advances in Neural Information Processing Systems, 33:20154–20166, 2020.
  35. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565, 2018.
  36. Denoising Diffusion Implicit Models. In International Conference on Learning Representations, 2021.
  37. Generative Modeling by Estimating Gradients of the Data Distribution. Advances in Neural Information Processing Systems, 32, 2019.
  38. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations, 2021.
  39. Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion, 2022. https://github.com/ashawkey/stable-dreamfusion.
  40. Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation. arXiv preprint arXiv:2212.00774, 2022.
  41. Novel View Synthesis with Diffusion Models. arXiv preprint arXiv:2210.04628, 2022.
Citations (59)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets