NeRFiller: Completing Scenes via Generative 3D Inpainting (2312.04560v1)
Abstract: We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpainting problem by leveraging a 2D inpainting diffusion model. We identify a surprising behavior of these models, where they generate more 3D consistent inpaints when images form a 2$\times$2 grid, and show how to generalize this behavior to more than four images. We then present an iterative framework to distill these inpainted regions into a single consistent 3D scene. In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text. We compare our approach to relevant baselines adapted to our setting on a variety of scenes, where NeRFiller creates the most 3D consistent and plausible scene completions. Our project page is at https://ethanweber.me/nerfiller.
- Visual prompting via image inpainting. In ANeurIPS, 2022.
- Multidiffusion: Fusing diffusion paths for controlled image generation. In ICML, 2023.
- Zoedepth: Zero-shot transfer by combining relative and metric depth. In arXiv preprint arXiv:2302.12288, 2023.
- Instructpix2pix: Learning to follow image editing instructions. In CVPR, 2023.
- Immersive light field video with a layered mesh representation. In SIGGRAPH, 2020.
- Diffdreamer: Consistent single-view perpetual view generation with conditional diffusion models. In ICCV, 2023.
- Persistent nature: A generative model of unbounded 3d worlds. In CVPR, 2023.
- Generative novel view synthesis with 3d-aware diffusion models. In arXiv, 2023.
- Scenedreamer: Unbounded 3d scene generation from 2d image collections. 2023.
- Objaverse-xl: A universe of 10m+ 3d objects. In arXiv, 2023a.
- Objaverse: A universe of annotated 3d objects. In CVPR, 2023b.
- Unconstrained scene generation with locally conditioned radiance fields. In ICCV, 2021.
- Texture synthesis by non-parametric sampling. In ICCV, 1999.
- Scenescape: Text-driven consistent scene generation. In arXiv, 2023.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In arXiv, 2022.
- Tokenflow: Consistent diffusion features for consistent video editing. In arXiv, 2023.
- Bayes’ Rays: Uncertainty quantification in neural radiance fields. In arXiv, 2023.
- Instruct-nerf2nerf: Editing 3d scenes with instructions. In ICCV, 2023.
- Instant 3D Photography. In SIGGRAPH, 2018.
- Casual 3D Photography. In SIGGRAPH Asia, 2017.
- Classifier-free diffusion guidance. In NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2022.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Text2room: Extracting textured 3d meshes from 2d text-to-image models. In ICCV, 2023.
- Musiq: Multi-scale image quality transformer. In ICCV, 2021.
- 3d gaussian splatting for real-time radiance field rendering. In SIGGRAPH, 2023.
- Pathdreamer: A world model for indoor navigation. In ICCV, 2021.
- Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation. In NeurIPS, 2023.
- Infinitenature-zero: Learning perpetual view generation of natural scenes from single images. In ECCV, 2022.
- Infinite nature: Perpetual view generation of natural scenes from a single image. In ICCV, 2021.
- Syncdreamer: Learning to generate multiview-consistent images from a single-view image. In arXiv, 2023.
- Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR, 2022.
- Realfusion: 360deg reconstruction of any object from a single image. In CVPR, 2023.
- Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. In SIGGRAPH, 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Reference-guided controllable inpainting of neural radiance fields. In ICCV, 2023a.
- SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields. In CVPR, 2023b.
- Snerf: stylized neural implicit representations for 3d scenes. In SIGGRAPH, 2022.
- State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICML, 2023.
- Dreambooth3d: Subject-driven text-to-3d generation. In ICCV, 2023.
- Pixelsynth: Generating a 3d-consistent experience from a single image. In ICCV, 2021.
- Ganerf: Leveraging discriminators to optimize neural radiance fields. In SIGGRAPH Asia, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
- Loftr: Detector-free local feature matching with transformers. In CVPR, 2021.
- Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. In arXiv, 2023.
- Resolution-robust large mask inpainting with fourier convolutions. In WACV, 2022.
- Nerfstudio: A modular framework for neural radiance field development. In SIGGRAPH Conference Proceedings, 2023.
- Realfill: Reference-driven generation for authentic image completion. In arXiv, 2023a.
- Mvdiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. In arXiv, 2023b.
- Single-view view synthesis with multiplane images. In CVPR, 2020.
- Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields. In arXiv, 2023a.
- Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. In ICCV, 2023b.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023c.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In NeurIPS, 2023d.
- Nerfbusters: Removing ghostly artifacts from casually captured nerfs. In ICCV, 2023.
- Novel view synthesis with diffusion models. In arXiv, 2022.
- Removing objects from neural radiance fields. In CVPR, 2023.
- Synsin: End-to-end view synthesis from a single image. In CVPR, 2020.
- Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In ICCV, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.