Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimized View and Geometry Distillation from Multi-view Diffuser (2312.06198v3)

Published 11 Dec 2023 in cs.CV

Abstract: Generating multi-view images from a single input view using image-conditioned diffusion models is a recent advancement and has shown considerable potential. However, issues such as the lack of consistency in synthesized views and over-smoothing in extracted geometry persist. Previous methods integrate multi-view consistency modules or impose additional supervisory to enhance view consistency while compromising on the flexibility of camera positioning and limiting the versatility of view synthesis. In this study, we consider the radiance field optimized during geometry extraction as a more rigid consistency prior, compared to volume and ray aggregation used in previous works. We further identify and rectify a critical bias in the traditional radiance field optimization process through score distillation from a multi-view diffuser. We introduce an Unbiased Score Distillation (USD) that utilizes unconditioned noises from a 2D diffusion model, greatly refining the radiance field fidelity. We leverage the rendered views from the optimized radiance field as the basis and develop a two-step specialization process of a 2D diffusion model, which is adept at conducting object-specific denoising and generating high-quality multi-view images. Finally, we recover faithful geometry and texture directly from the refined multi-view images. Empirical evaluations demonstrate that our optimized geometry and view distillation technique generates comparable results to the state-of-the-art models trained on extensive datasets, all while maintaining freedom in camera positioning. Please see our project page at https://youjiazhang.github.io/USD/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Generative novel view synthesis with 3d-aware diffusion models. In ICCV, 2023.
  2. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
  3. Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018.
  4. Diffusion models in vision: A survey. T-PAMI, 2023.
  5. Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. In CVPR, 2023.
  6. Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA, 2022.
  7. Hugging Face. One-2-3-45. https://huggingface.co/spaces/One-2-3-45/One-2-3-45, 2023.
  8. Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. In ICML, 2023.
  9. Yuan-Chen Guo. Instant neural surface reconstruction, 2022. https://github.com/bennyguo/instant-nsr-pl.
  10. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  11. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  12. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  13. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
  14. Generative scene synthesis via incremental view inpainting using rgbd diffusion models. In CVPR, 2022.
  15. Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
  16. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928, 2023a.
  17. Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023b.
  18. Deceptive-nerf: Enhancing nerf reconstruction using pseudo-observations from diffusion models. arXiv preprint arXiv:2305.15171, 2023c.
  19. Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023d.
  20. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. In European Conference on Computer Vision, pages 210–227. Springer, 2022.
  21. Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
  22. Realfusion: 360deg reconstruction of any object from a single image. In CVPR, 2023.
  23. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  24. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
  25. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
  26. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
  27. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, pages 8748–8763. PMLR, 2021a.
  28. Learning transferable visual models from natural language supervision. In ICML, 2021b.
  29. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  30. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  31. Scene representation transformer: Geometry-free novel view synthesis through set-latent scene representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6229–6238, 2022.
  32. Mvdream: Multi-view diffusion for 3d generation. CoRR, abs/2308.16512, 2023.
  33. Viewset diffusion: (0-)image-conditioned 3d generative models from 2d data. CoRR, abs/2306.07881, 2023a.
  34. Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data. arXiv preprint arXiv:2306.07881, 2023b.
  35. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463, 2023.
  36. Mvdiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. arXiv preprint arXiv:2307.01097, 2023.
  37. Diffusion with forward models: Solving stochastic inverse problems without direct supervision. arXiv preprint arXiv:2306.11719, 2023.
  38. Consistent view synthesis with pose-guided diffusion models. In CVPR, 2023.
  39. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023a.
  40. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, 2021.
  41. Image quality assessment: from error visibility to structural similarity. TIP, 2004.
  42. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023b.
  43. Novel view synthesis with diffusion models. arXiv preprint arXiv:2210.04628, 2022.
  44. Consistent123: Improve consistency for one image to 3d object synthesis. arXiv preprint arXiv:2310.08092, 2023.
  45. 3d-aware image generation using 2d diffusion models. arXiv preprint arXiv:2303.17905, 2023.
  46. Neurallift-360: Lifting an in-the-wild 2d photo to A 3d object with 360° views. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 4479–4489. IEEE, 2023a.
  47. Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In CVPR, 2023b.
  48. Consistnet: Enforcing 3d consistency for multi-view images diffusion. arXiv preprint arXiv:2310.10343, 2023.
  49. Dreamsparse: Escaping from plato’s cave with 2d frozen diffusion model given sparse views. CoRR, 2023.
  50. Long-term photometric consistent novel view synthesis with diffusion models. In ICCV, 2023.
  51. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  52. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Youjia Zhang (16 papers)
  2. Junqing Yu (24 papers)
  3. Zikai Song (17 papers)
  4. Wei Yang (349 papers)
  5. Yawei Luo (40 papers)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

  1. USD