Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation (2404.03656v1)

Published 4 Apr 2024 in cs.CV

Abstract: We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images. While recent methods pursuing 3D inference advocate learning novel-view generative models, these generations are not 3D-consistent and require a distillation process to generate a 3D output. We instead cast the task of 3D inference as directly generating mutually-consistent multiple views and build on the insight that additionally inferring depth can provide a mechanism for enforcing this consistency. Specifically, we train a denoising diffusion model to generate multi-view RGB-D images given a single RGB input image and leverage the (intermediate noisy) depth estimates to obtain reprojection-based conditioning to maintain multi-view consistency. We train our model using large-scale synthetic dataset Obajverse as well as the real-world CO3D dataset comprising of generic camera viewpoints. We demonstrate that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods. We also evaluate the geometry induced by our multi-view depth prediction and find that it yields a more accurate representation than other direct 3D inference approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In ECCV, 2016.
  2. Objaverse: A universe of annotated 3d objects. In CVPR, 2023.
  3. Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. In CVPR, 2023.
  4. Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA, 2022.
  5. A point set generation network for 3d object reconstruction from a single image. In CVPR, 2017.
  6. Learning a predictable and generative vector representation for objects. In ECCV, 2016.
  7. Mesh r-cnn. In ICCV, 2019.
  8. Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. In ICML, 2023.
  9. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  10. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  11. Epidiff: Enhancing multi-view synthesis via localized epipolar-constrained diffusion. In CVPR, 2024.
  12. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
  13. Learning category-specific mesh reconstruction from image collections. In ECCV, 2018.
  14. Spad : Spatially aware multiview diffusers. In CVPR, 2024.
  15. Directed ray distance functions for 3d scene reconstruction. In ECCV, 2022.
  16. Learning to predict scene-level implicit 3d from posed rgbd data. In CVPR, 2023.
  17. Sdf-srn: Learning signed distance 3d object reconstruction from static images. In NeurIPS, 2020.
  18. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. In NeurIPS, 2023a.
  19. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion. In CVPR, 2024a.
  20. Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023b.
  21. Syncdreamer: Learning to generate multiview-consistent images from a single-view image. In ICLR, 2024b.
  22. Wonder3d: Single image to 3d using cross-domain diffusion. In CVPR, 2024.
  23. Realfusion: 360deg reconstruction of any object from a single image. In CVPR, 2023.
  24. Occupancy networks: Learning 3d reconstruction in function space. In CVPR, 2019.
  25. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  26. From image collections to point clouds with self-supervised shape and pose networks. In CVPR, 2020.
  27. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
  28. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
  29. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. In ICLR, 2024.
  30. Learning transferable visual models from natural language supervision. In ICML, 2021.
  31. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In ICCV, 2021.
  32. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  33. Anything-3d: Towards single-view anything reconstruction in the wild. arXiv preprint arXiv:2304.10261, 2023.
  34. Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110, 2023.
  35. Mvdream: Multi-view diffusion for 3d generation. In ICLR, 2024.
  36. Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data. In ICCV, 2023.
  37. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In ICCV, 2023a.
  38. Mvdiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. In NeurIPS, 2023b.
  39. Diffusion with forward models: Solving stochastic inverse problems without direct supervision. In NeurIPS, 2023.
  40. Multi-view supervision for single-view reconstruction via differentiable ray consistency. In CVPR, 2017.
  41. Pre-train, self-train, distill: A simple recipe for supersizing 3d reconstruction. In CVPR, 2022.
  42. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023.
  43. Pixel2mesh: Generating 3d mesh models from single rgb images. In ECCV, 2018.
  44. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, 2021.
  45. Image quality assessment: from error visibility to structural similarity. In TIP, 2004.
  46. Crm: Single image to 3d textured mesh with convolutional reconstruction model. arXiv preprint arXiv:2403.05034, 2024.
  47. Multiview compressive coding for 3d reconstruction. In CVPR, 2023.
  48. Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views. In CVPR, 2023.
  49. Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. In NeurIPS, 2019.
  50. Shelf-supervised mesh prediction in the wild. In CVPR, 2021.
  51. pixelnerf: Neural radiance fields from one or few images. In CVPR, 2021.
  52. RelPose: Predicting probabilistic relative rotation for single objects in the wild. In ECCV, 2022.
  53. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  54. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hanzhe Hu (7 papers)
  2. Zhizhuo Zhou (3 papers)
  3. Varun Jampani (125 papers)
  4. Shubham Tulsiani (71 papers)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com