SIGNeRF: Scene Integrated Generation for Neural Radiance Fields (2401.01647v2)
Abstract: Advances in image diffusion models have recently led to notable improvements in the generation of high-quality images. In combination with Neural Radiance Fields (NeRFs), they enabled new opportunities in 3D generation. However, most generative 3D approaches are object-centric and applying them to editing existing photorealistic scenes is not trivial. We propose SIGNeRF, a novel approach for fast and controllable NeRF scene editing and scene-integrated object generation. A new generative update strategy ensures 3D consistency across the edited images, without requiring iterative optimization. We find that depth-conditioned diffusion models inherently possess the capability to generate 3D consistent views by requesting a grid of images instead of single views. Based on these insights, we introduce a multi-view reference sheet of modified images. Our method updates an image collection consistently based on the reference sheet and refines the original NeRF with the newly generated image set in one go. By exploiting the depth conditioning mechanism of the image diffusion model, we gain fine control over the spatial location of the edit and enforce shape guidance by a selected region or an external mesh.
- AUTOMATIC1111. Stable diffusion webui. https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2022.
- Sine: Semantic-driven image-based nerf editing with prior-guided editing field. pages 20919–20929, 2023.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. pages 5470–5479, 2021.
- Instructpix2pix: Learning to follow image editing instructions. pages 18392–18402, 2022.
- Ricardo Cabello. Three.js, 2010.
- Text2shape: Generating shapes from natural language by learning joint embeddings. pages 100–116, 2019.
- Tango: Text-driven photorealistic and robust 3d stylization via lighting decomposition. 2022.
- Set-the-scene: Global-local training for generating controllable nerf scenes. arXiv preprint arXiv:2303.13450, 2023.
- Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. pages 20637–20647, 2022.
- Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2021.
- Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems, 35:31841–31854, 2022.
- Blended-nerf: Zero-shot object generation and blending in existing neural radiance fields. arXiv preprint arXiv:2306.12760, 2023.
- Instruct-nerf2nerf: Editing 3d scenes with instructions. 2023.
- Chris Heinrich. Polycam: Lidar scanning app for iphone. https://poly.cam/, 2023. Polycam Inc. provides a fast and accurate 3D scanning app leveraging the LiDAR sensor on the iPhone.
- Paul Henschel. React three fiber, 2019.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Lora: Low-rank adaptation of large language models. 2022.
- Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv preprint arXiv:2306.12422, 2023.
- Zero-shot text-guided object generation with dream fields. pages 867–876, 2022.
- Large scale multi-view stereopsis evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 406–413, 2014.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, volume 42(4), July 2023, 42(4):1–14, 2023.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. 2022.
- Control-nerf: Editable feature volumes for scene rendering and manipulation. pages 4340–4350, 2022.
- Diffusion-sdf: Text-to-shape via voxelized diffusion. pages 12642–12651, 2022.
- Interactive geometry editing of neural radiance fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 6(1), 2023.
- Magic3d: High-resolution text-to-3d content creation. 2022.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, 2023a.
- Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023b.
- Meta. React, 2013.
- Latent-nerf for shape-guided generation of 3d shapes and textures. pages 12663–12673, 2023.
- Text2mesh: Text-driven neural stylization for meshes. pages 13492–13502, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2020.
- Diffrf: Rendering-guided 3d radiance field diffusion. pages 4328–4338, 2022a.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41, 4, Article 102 (July 2022), 15 pages, 41(4):1–15, 2022b.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
- Compositional 3d scene generation using locally conditioned diffusion. arXiv preprint arXiv:2303.12218, 2023.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion. 2023.
- Learning transferable visual models from natural language supervision. pages 8748–8763, 2021.
- High-resolution image synthesis with latent diffusion models. pages 10684–10695, 2021.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. pages 22500–22510, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Clip-forge: Towards zero-shot text-to-shape generation. pages 18603–18613, 2021.
- Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Laion-5b: An open large-scale dataset for training next generation image-text models. 36, 2022.
- Vox-e: Text-guided voxel editing of 3d objects. arXiv preprint arXiv:2303.12048, 2023.
- Controlnetinpaint: Inpaint images with controlnet. https://github.com/mikonvergence/ControlNetInpaint, 2023. GitHub repository.
- Deep unsupervised learning using nonequilibrium thermodynamics. pages 2256–2265, 2015.
- Nerfstudio: A modular framework for neural radiance field development. arXiv preprint arXiv:2302.04264, 2023.
- Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184, 2023.
- Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
- Sketch-guided text-to-image diffusion models. arXiv preprint arXiv:2211.13752, 2022.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
- Novel view synthesis with diffusion models. arXiv preprint arXiv:2210.04628, 2022.
- Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360° views. pages 4479–4489, 2022.
- Deforming radiance fields with cages. pages 159–175, 2022.
- Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. pages 597–614, 2022.
- Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. pages 1790–1799, 2019.
- Nerf-editing: Geometry editing of neural radiance fields. pages 18353–18364, 2022.
- Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. pages 586–595, 2018.
- Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. pages 12588–12597, 2022.
- Dreameditor: Text-driven 3d scene editing with neural fields, 2023.
- Jan-Niklas Dihlmann (2 papers)
- Andreas Engelhardt (6 papers)
- Hendrik Lensch (4 papers)