3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation (2410.18974v1)
Abstract: Multi-view image diffusion models have significantly advanced open-domain 3D object generation. However, most existing models rely on 2D network architectures that lack inherent 3D biases, resulting in compromised geometric consistency. To address this challenge, we introduce 3D-Adapter, a plug-in module designed to infuse 3D geometry awareness into pretrained image diffusion models. Central to our approach is the idea of 3D feedback augmentation: for each denoising step in the sampling loop, 3D-Adapter decodes intermediate multi-view features into a coherent 3D representation, then re-encodes the rendered RGBD views to augment the pretrained base model through feature addition. We study two variants of 3D-Adapter: a fast feed-forward version based on Gaussian splatting and a versatile training-free version utilizing neural fields and meshes. Our extensive experiments demonstrate that 3D-Adapter not only greatly enhances the geometry quality of text-to-multi-view models such as Instant3D and Zero123++, but also enables high-quality 3D generation using the plain text-to-image Stable Diffusion. Furthermore, we showcase the broad application potential of 3D-Adapter by presenting high quality results in text-to-3D, image-to-3D, text-to-texture, and text-to-avatar tasks.
- RenderDiffusion: Image diffusion for 3D reconstruction, inpainting and generation. In CVPR, 2023.
- Gaudi: A neural architect for immersive 3d scene generation. In NeurIPS, 2022.
- Efficient geometry-aware 3D generative adversarial networks. In CVPR, 2022.
- Text2tex: Text-driven texture synthesis via diffusion models. In ICCV, 2023a.
- Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. In ICCV, 2023b.
- V3d: Video diffusion models are effective 3d generators, 2024.
- Objaverse: A universe of annotated 3d objects. In CVPR, 2023.
- 8-bit optimizers via block-wise quantization. In ICLR, 2022.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA, pp. 2553–2560, 2022.
- From data to functa: Your data point is a function and you can treat it like one. In ICML, 2022.
- Genesistex: Adapting image denoising diffusion to texture space. In CVPR, 2024.
- Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. In ICML, 2023.
- 3dgen: Triplane latent diffusion for textured mesh generation, 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Classifier-free diffusion guidance. In NeurIPS Workshop, 2021.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- 2d gaussian splatting for geometrically accurate radiance fields, 2024.
- Zero-shot text-guided object generation with dream fields. In CVPR, 2022.
- Shap-e: Generating conditional 3d implicit functions, 2023.
- Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023. URL https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/.
- Infonerf: Ray entropy minimization for few-shot neural volume rendering. In CVPR, 2022.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Tracer: Extreme attention guided salient object tracing network. In AAAI, 2022.
- Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. In ICLR, 2024. URL https://openreview.net/forum?id=2lDQLiH1W4.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, 2022.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. In NeurIPS, 2023a.
- One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion. In CVPR, 2024a.
- Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023b.
- Syncdreamer: Generating multiview-consistent images from a single-view image. In ICLR, 2024b.
- Text-guided texturing by synchronized multi-view diffusion, 2023c.
- Sparseneus: Fast generalizable neural surface reconstruction from sparse views. In ECCV, 2022.
- Wonder3d: Single image to 3d using cross-domain diffusion. In CVPR, 2024.
- Decoupled weight decay regularization. In ICLR, 2019.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS, 2022.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In CVPR, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Diffrf: Rendering-guided 3d radiance field diffusion. In CVPR, 2023.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 41(4):102:1–102:15, July 2022. doi: 10.1145/3528223.3530127. URL https://doi.org/10.1145/3528223.3530127.
- Scalable diffusion models with transformers. In ICCV, 2023.
- State of the art on diffusion models for visual computing. In Eurographics STAR, 2024.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. In ICLR, 2024.
- Learning transferable visual models from natural language supervision. In ICML, pp. 8748–8763, 2021.
- Texture: Text-guided texturing of 3d shapes. In SIGGRAPH, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Laion-5b: An open large-scale dataset for training next generation image-text models. In NeurIPS Workshop, 2022.
- Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In NeurIPS, 2021.
- Zero123++: a single image to consistent multi-view diffusion base model, 2023.
- Mvdream: Multi-view diffusion for 3d generation. In ICLR, 2024.
- 3d neural field generation using triplane diffusion. In CVPR, 2023.
- Score-based generative modeling through stochastic differential equations. In ICLR, 2021.
- Laplacian surface editing. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, SGP ’04, pp. 175–184, New York, NY, USA, 2004. Association for Computing Machinery. ISBN 3905673134. doi: 10.1145/1057432.1057456. URL https://doi.org/10.1145/1057432.1057456.
- Lgm: Large multi-view gaussian model for high-resolution 3d content creation, 2024a.
- Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. In ICLR, 2024b.
- Diffusion with forward models: Solving stochastic inverse problems without direct supervision. In NeurIPS, 2023.
- SV3D: Novel multi-view synthesis and 3D generation from a single image using latent video diffusion. arXiv, 2024.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, pp. 27171–27183, 2021a.
- PF-LRM: Pose-free large reconstruction model for joint pose and shape prediction. In ICLR, 2024. URL https://openreview.net/forum?id=noe76eRcPC.
- Rodin: A generative model for sculpting 3d digital avatars using diffusion. In CVPR, 2023.
- Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In ICCV Workshop, 2021b.
- Image quality assessment: from error visibility to structural similarity. IEEE TIP, 13(4):600–612, 2004. doi: 10.1109/TIP.2003.819861.
- Novel view synthesis with diffusion models. In ICLR, 2023.
- Gpt-4v(ision) is a human-aligned evaluator for text-to-3d generation. In CVPR, 2024.
- Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation, 2024a.
- Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model. In ICLR, 2024b.
- Gaussian opacity fields: Efficient and compact surface reconstruction in unbounded scenes, 2024.
- Arf: Artistic radiance fields. In ECCV, 2022.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Locally attentional sdf diffusion for controllable 3d shape generation. ACM Transactions on Graphics, 42(4), 2023.
- Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. In CVPR, 2024.
- Videomv: Consistent multi-view generation based on large video generative model, 2024.
- Hansheng Chen (12 papers)
- Bokui Shen (16 papers)
- Yulin Liu (21 papers)
- Ruoxi Shi (20 papers)
- Linqi Zhou (20 papers)
- Connor Z. Lin (7 papers)
- Jiayuan Gu (28 papers)
- Hao Su (218 papers)
- Gordon Wetzstein (144 papers)
- Leonidas Guibas (177 papers)