Generic 3D Diffusion Adapter Using Controlled Multi-View Editing (2403.12032v2)
Abstract: Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images and output high-quality textured meshes. Built on off-the-shelf 2D diffusion models, MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation, then conditions the 2D views of the next timestep using rendered views, without uncompromising visual quality. With an inference time of only 2-5 minutes, this framework achieves better trade-off between quality and speed than score distillation. MVEdit is highly versatile and extendable, with a wide range of applications including text/image-to-3D generation, 3D-to-3D editing, and high-quality texture synthesis. In particular, evaluations demonstrate state-of-the-art performance in both image-to-3D and text-guided texture generation tasks. Additionally, we introduce a method for fine-tuning 2D latent diffusion models on small 3D datasets with limited resources, enabling fast low-resolution text-to-3D initialization.
- Cross-Image Attention for Zero-Shot Appearance Transfer. arXiv:2311.03335 [cs.CV]
- RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation. In CVPR.
- GAUDI: A Neural Architect for Immersive 3D Scene Generation. In NeurIPS.
- InstructPix2Pix: Learning to Follow Image Editing Instructions. In CVPR.
- TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models. In ICCV.
- Efficient Geometry-aware 3D Generative Adversarial Networks. In CVPR.
- GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models. In ICCV.
- ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR]. Stanford University — Princeton University — Toyota Technological Institute at Chicago.
- Text2Tex: Text-driven Texture Synthesis via Diffusion Models. In ICCV.
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction. In ICCV.
- Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In ICCV.
- Objaverse: A Universe of Annotated 3D Objects. In CVPR.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA. 2553–2560.
- From data to functa: Your data point is a function and you can treat it like one. In ICML.
- Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans. In ICCV. 10786–10796.
- Arpad E Elo. 1967. The proposed uscf rating system, its development, theory, and applications. Chess Life 22, 8 (1967), 242–247.
- NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion. In ICML.
- 3DGen: Triplane Latent Diffusion for Textured Mesh Generation. arXiv:2303.05371 [cs.CV]
- Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions. In ICCV.
- GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS.
- Denoising Diffusion Probabilistic Models. In NeurIPS.
- Jonathan Ho and Tim Salimans. 2021. Classifier-Free Diffusion Guidance. In NeurIPS Workshop.
- LoRA: Low-Rank Adaptation of Large Language Models. In ICLR. https://openreview.net/forum?id=nZeVKeeFYf9
- Zero-Shot Text-Guided Object Generation with Dream Fields.
- InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering. In CVPR.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
- TRACER: Extreme Attention Guided Salient Object Tracing Network. In AAAI.
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML.
- Magic3D: High-Resolution Text-to-3D Content Creation. In CVPR.
- One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. In CVPR.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. In NeurIPS.
- Zero-1-to-3: Zero-shot One Image to 3D Object. In ICCV.
- SyncDreamer: Generating Multiview-consistent Images from a Single-view Image. In ICLR.
- Wonder3D: Single Image to 3D using Cross-Domain Diffusion. In CVPR.
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In NeurIPS.
- Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR. 11461–11471.
- SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In ICLR.
- Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. In CVPR.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
- DiffRF: Rendering-Guided 3D Radiance Field Diffusion. In CVPR.
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Transactions on Graphics 41, 4, Article 102 (July 2022), 15 pages. https://doi.org/10.1145/3528223.3530127
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Compositional 3D Scene Generation using Locally Conditioned Diffusion. In 3DV.
- DreamFusion: Text-to-3D using 2D Diffusion. In ICLR.
- Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. In ICLR.
- Learning transferable visual models from natural language supervision. In ICML. 8748–8763.
- Texture: Text-guided texturing of 3d shapes. In SIGGRAPH.
- High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR.
- LAION-5B: An open large-scale dataset for training next generation image-text models. In NeurIPS Workshop.
- Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis. In NeurIPS.
- Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model. arXiv:2310.15110
- MVDream: Multi-view Diffusion for 3D Generation. In ICLR.
- 3D Neural Field Generation using Triplane Diffusion. In CVPR.
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In NeurIPS.
- Score-Based Generative Modeling through Stochastic Differential Equations. In ICLR.
- Laplacian Surface Editing. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing (Nice, France) (SGP ’04). Association for Computing Machinery, New York, NY, USA, 175–184. https://doi.org/10.1145/1057432.1057456
- Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. In ICLR.
- DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. In ICLR.
- Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision. In NeurIPS.
- NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. In NeurIPS. 27171–27183.
- Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. In CVPR.
- Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. In ICCV Workshop.
- ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In NeurIPS.
- Novel View Synthesis with Diffusion Models. In ICLR.
- GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation. In CVPR.
- DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model. In ICLR.
- IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. arXiv:2308.06721
- Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
- Locally Attentional SDF Diffusion for Controllable 3D Shape Generation. ACM Transactions on Graphics 42, 4 (2023).
- Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. In CVPR.
- Hansheng Chen (12 papers)
- Ruoxi Shi (20 papers)
- Yulin Liu (21 papers)
- Bokui Shen (16 papers)
- Jiayuan Gu (28 papers)
- Gordon Wetzstein (144 papers)
- Hao Su (218 papers)
- Leonidas Guibas (177 papers)