Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion (2303.11938v2)

Published 21 Mar 2023 in cs.CV

Abstract: We tackle the task of text-to-3D creation with pre-trained latent-based NeRFs (NeRFs that generate 3D objects given input latent code). Recent works such as DreamFusion and Magic3D have shown great success in generating 3D content using NeRFs and text prompts, but the current approach of optimizing a NeRF for every text prompt is 1) extremely time-consuming and 2) often leads to low-resolution outputs. To address these challenges, we propose a novel method named 3D-CLFusion which leverages the pre-trained latent-based NeRFs and performs fast 3D content creation in less than a minute. In particular, we introduce a latent diffusion prior network for learning the w latent from the input CLIP text/image embeddings. This pipeline allows us to produce the w latent without further optimization during inference and the pre-trained NeRF is able to perform multi-view high-resolution 3D synthesis based on the latent. We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code. We demonstrate through experiments the effectiveness of our proposed view-invariant diffusion process for fast text-to-3D creation, e.g., 100 times faster than DreamFusion. We note that our model is able to serve as the role of a plug-and-play tool for text-to-3D with pre-trained NeRFs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Clip2stylegan: Unsupervised extraction of stylegan edit directions. In ACM SIGGRAPH 2022 conference proceedings, pages 1–9, 2022.
  2. Efficient geometry-aware 3d generative adversarial networks. In CVPR, pages 16123–16133, 2022.
  3. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In CVPR, pages 5799–5809, 2021.
  4. Sofgan: A portrait image generator with dynamic styling. ACM Transactions on Graphics (TOG), 41(1):1–26, 2022.
  5. Sem2nerf: Converting single-view semantic masks to neural radiance fields. ECCV, 2022.
  6. Stargan v2: Diverse image synthesis for multiple domains. In CVPR, 2020.
  7. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
  8. Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. ICLR, 2021.
  9. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  10. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  11. Zero-shot text-guided object generation with dream fields. In CVPR, pages 867–876, 2022.
  12. Ray tracing volume densities. ACM SIGGRAPH computer graphics, 18(3):165–174, 1984.
  13. A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
  14. Analyzing and improving the image quality of StyleGAN. In CVPR, 2020.
  15. Stylemc: Multi-channel based fast text-guided image generation and manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 895–904, 2022.
  16. 3d-aware encoding for style-based neural radiance fields. arXiv preprint arXiv:2211.06583, 2022.
  17. Magic3d: High-resolution text-to-3d content creation. arXiv preprint arXiv:2211.10440, 2022.
  18. Causal transformer for estimating counterfactual outcomes. arXiv preprint arXiv:2204.07258, 2022.
  19. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, pages 405–421. Springer, 2020.
  20. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  21. Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, pages 11453–11464, 2021.
  22. Stylesdf: High-resolution 3d-consistent image and geometry generation. In CVPR, pages 13503–13513, 2022.
  23. A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. NeurIPS, 34:20002–20013, 2021.
  24. Styleclip: Text-driven manipulation of stylegan imagery. In ICCV, pages 2085–2094, 2021.
  25. clip2latent: Text driven sampling of a pre-trained stylegan using denoising diffusion and clip. BMVC, 2022.
  26. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  27. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  28. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  29. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  30. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  31. Graf: Generative radiance fields for 3d-aware image synthesis. NeurIPS, 33:20154–20166, 2020.
  32. Interpreting the latent space of gans for semantic face editing. In CVPR, pages 9243–9252, 2020.
  33. 3d-aware image synthesis via learning structural and textural representations. In CVPR, pages 18430–18439, 2022.
  34. A large-scale car dataset for fine-grained categorization and verification. In CVPR, pages 3973–3981, 2015.
  35. Generative multiplane images: Making a 2d gan 3d-aware. In ECCV, 2022.
  36. Lafite: Towards language-free training for text-to-image generation. arXiv preprint arXiv:2111.13792, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yu-Jhe Li (23 papers)
  2. Tao Xu (133 papers)
  3. Ji Hou (25 papers)
  4. Bichen Wu (52 papers)
  5. Xiaoliang Dai (44 papers)
  6. Albert Pumarola (31 papers)
  7. Peizhao Zhang (40 papers)
  8. Peter Vajda (52 papers)
  9. Kris Kitani (96 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.