Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MVD$^2$: Efficient Multiview 3D Reconstruction for Multiview Diffusion (2402.14253v1)

Published 22 Feb 2024 in cs.CV and cs.GR

Abstract: As a promising 3D generation technique, multiview diffusion (MVD) has received a lot of attention due to its advantages in terms of generalizability, quality, and efficiency. By finetuning pretrained large image diffusion models with 3D data, the MVD methods first generate multiple views of a 3D object based on an image or text prompt and then reconstruct 3D shapes with multiview 3D reconstruction. However, the sparse views and inconsistent details in the generated images make 3D reconstruction challenging. We present MVD$2$, an efficient 3D reconstruction method for multiview diffusion (MVD) images. MVD$2$ aggregates image features into a 3D feature volume by projection and convolution and then decodes volumetric features into a 3D mesh. We train MVD$2$ with 3D shape collections and MVD images prompted by rendered views of 3D shapes. To address the discrepancy between the generated multiview images and ground-truth views of the 3D shapes, we design a simple-yet-efficient view-dependent training scheme. MVD$2$ improves the 3D generation quality of MVD and is fast and robust to various MVD methods. After training, it can efficiently decode 3D meshes from multiview images within one second. We train MVD$2$ with Zero-123++ and ObjectVerse-LVIS 3D dataset and demonstrate its superior performance in generating 3D models from multiview images generated by different MVD methods, using both synthetic and real images as prompts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. 2023. Stable Zero123. https://huggingface.co/stabilityai/stable-zero123.
  2. Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation. In ICCV.
  3. Control3D: Towards controllable text-to-3D generation. In ACM Multimedia. 1148–1156.
  4. Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In CVPR. 5939–5948.
  5. SDFusion: Multimodal 3D shape completion, reconstruction, and generation. In CVPR. 4456–4465.
  6. 3D scene geometry estimation from 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT imagery: A survey. ACM Comput. Surv. 55, 4, Article 68 (2022), 39 pages.
  7. Objaverse-XL: A universe of 10M+ 3D objects. In NeurIPS.
  8. Objaverse: A universe of annotated 3D objects. In CVPR. 13142–13153.
  9. Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. In CVPR. 20637–20647.
  10. Google scanned objects: A high-quality dataset of 3D scanned household items. In ICRA. IEEE, 2553–2560.
  11. Get3D: A generative model of high quality 3D textured shapes learned from images. NeurIPS 35 (2022), 31841–31854.
  12. 3DGen: Triplane latent diffusion for textured mesh generation. arXiv:2303.05371.
  13. Zexin He and Tengfei Wang. 2023. OpenLRM: Open-source large reconstruction models. https://github.com/3DTopia/OpenLRM.
  14. LRM: Large reconstruction model for single image to 3D. In ICLR.
  15. ZeroShape: Regression-based zero-shot shape reconstruction. arXiv:2312.14198.
  16. Octree Transformer: Autoregressive 3D shape generation on hierarchically structured sequences. In CVPR. 2697–2706.
  17. Zero-shot text-guided object generation with dream fields. In CVPR. 867–876.
  18. Heewoo Jun and Alex Nichol. 2023. Shap-E: Generating conditional 3D implicit functions. arXiv:2305.02463.
  19. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42, 4, Article 139 (2023), 14 pages.
  20. Modular primitives for high-performance differentiable rendering. ACM Trans. Graph. 39, 6 (2020), 1–14.
  21. Diffusion-SDF: Text-to-shape via voxelized diffusion. In CVPR. 12642–12651.
  22. SweetDreamer: Aligning geometric priors in 2D diffusion for consistent Text-to-3D. In ICLR.
  23. One-2-3-45++: Fast single image to 3D objects with consistent multi-view generation and 3D diffusion. arXiv:2311.07885.
  24. One-2-3-45: Any single image to 3D mesh in 45 seconds without per-shape optimization. In NeurIPS.
  25. Zero-1-to-3: Zero-shot one image to 3D object. In ICCV.
  26. SyncDreamer: Generating multiview-consistent images from a single-view image. In ICLR.
  27. Text-guided texturing by synchronized multi-view diffusion. arXiv:2311.12891.
  28. ISS: Image as stetting stone for text-guided 3D shape generation. In ICLR.
  29. Wonder3D: Single Image to 3D using Cross-Domain Diffusion. arXiv:2310.15008.
  30. SparseNeuS: Fast generalizable neural surface reconstruction from sparse views. In ECCV. Springer, 210–227.
  31. Direct2.5: Diverse text-to-3D generation via multi-view 2.5D diffusion. arXiv:2311.15980.
  32. Realfusion: 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT reconstruction of any object from a single image. In CVPR. 8446–8455.
  33. NeRF: Representing scenes as neural radiance fields for view synthesis. In ECCV.
  34. AutoSDF: Shape priors for 3D completion, reconstruction and generation. In CVPR.
  35. PolyGen: An autoregressive generative model of 3D meshes. In ICML. PMLR, 7220–7229.
  36. DINOv2: Learning robust visual features without supervision. arXiv:2304.07193.
  37. Chasing consistency in text-to-3D generation from a single image. arXiv:2309.03599.
  38. A survey of structure from motion. Acta Numerica 26 (2017), 305–364.
  39. Dreamfusion: Text-to-3D using 2D diffusion. In ICLR.
  40. Senthil Purushwalkam and Nikhil Naik. 2023. ConRad: Image constrained radiance fields for 3D generation from a single image. In NeurIPS.
  41. Magic123: One image to high-quality 3D object generation using both 2D and 3D diffusion priors. In ICLR.
  42. High-resolution image synthesis with latent diffusion models. In CVPR.
  43. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR. IEEE, 519–528.
  44. Deep marching tetrahedra: A hybrid representation for high-resolution 3D shape synthesis. NeurIPS 34 (2021), 6087–6101.
  45. Flexible isosurface extraction for gradient-based mesh optimization. ACM Trans. Graph. 42, 4, Article 37 (2023), 16 pages.
  46. Zero123++: a single image to consistent multi-view diffusion base model. arXiv:2310.15110.
  47. MVDream: Multi-view diffusion for 3D generation. In ICLR.
  48. DreamCraft3D: Hierarchical 3D generation with bootstrapped diffusion prior. In ICLR.
  49. Splatter Image: Ultra-fast single-view 3D reconstruction. arXiv:2312.13150.
  50. Make-It-3D: High-fidelity 3D creation from a single image with diffusion prior. In ICCV.
  51. MVDiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. In NeurIPS.
  52. Score Jacobian chaining: Lifting pretrained 2D diffusion models for 3D generation. In CVPR.
  53. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS.
  54. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
  55. ProlificDreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. In NeurIPS.
  56. Consistent123: Improve consistency for one image to 3D object synthesis. arXiv:2310.08092.
  57. HarmonyView: Harmonizing consistency and diversity in one-image-to-3D. arXiv:2312.15980.
  58. Multiview compressive coding for 3D reconstruction. In CVPR. 9065–9075.
  59. Deep multi-view learning methods: A review. Neurocomputing 448 (2021), 106–129.
  60. ConsistNet: Enforcing 3D consistency for multi-view images diffusion. arXiv:2310.10343.
  61. Consistent-1-to-3: Consistent image to 3D view synthesis via geometry-aware diffusion models. In 3DV.
  62. pixelNeRF: Neural radiance fields from one or few images. In CVPR.
  63. HiFi-123: Towards high-fidelity one image to 3D content generation. arXiv:2310.06744.
  64. IPDreamer: Appearance-controllable 3D object generation with image prompts. arXiv:2310.05375.
  65. 3DILG: Irregular latent grids for 3D generative modeling. In NeurIPS.
  66. 3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models. ACM Trans. Graph. 42, 4, Article 92 (2023), 16 pages.
  67. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR. 586–595.
  68. SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D shape generation. Comput. Graph. Forum 41 (2022), 52–63.
  69. Locally Attentional SDF diffusion for controllable 3D shape generation. ACM Trans. Graph. 42, 4, Article 91 (2023), 13 pages.
  70. Triplane meets Gaussian splatting: Fast and generalizable single-view 3D reconstruction with transformers. arXiv:2312.09147.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xin-Yang Zheng (5 papers)
  2. Hao Pan (94 papers)
  3. Yu-Xiao Guo (8 papers)
  4. Xin Tong (193 papers)
  5. Yang Liu (2253 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.