InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes (2401.05335v1)
Abstract: We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes. Based on a user-provided textual description and a 2D bounding box in a reference viewpoint, InseRF generates new objects in 3D scenes. Recently, methods for 3D scene editing have been profoundly transformed, owing to the use of strong priors of text-to-image diffusion models in 3D generative modeling. Existing methods are mostly effective in editing 3D scenes via style and appearance changes or removing existing objects. Generating new objects, however, remains a challenge for such methods, which we address in this study. Specifically, we propose grounding the 3D object insertion to a 2D object insertion in a reference view of the scene. The 2D edit is then lifted to 3D using a single-view object reconstruction method. The reconstructed object is then inserted into the scene, guided by the priors of monocular depth estimation methods. We evaluate our method on various 3D scenes and provide an in-depth analysis of the proposed components. Our experiments with generative insertion of objects in several 3D scenes indicate the effectiveness of our method compared to the existing methods. InseRF is capable of controllable and 3D-consistent object insertion without requiring explicit 3D information as input. Please visit our project page at https://mohamad-shahbazi.github.io/inserf.
- Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18208–18218, 2022.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
- Gaudi: A neural architect for immersive 3d scene generation. arXiv, 2022.
- Instructpix2pix: Learning to follow image editing instructions. In CVPR, 2023.
- Efficient geometry-aware 3D generative adversarial networks. In arXiv, 2021.
- Dense pixel-to-pixel harmonization via continuous image representation. IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1, 2023a.
- Zero-shot image harmonization with generative model prior, 2023b.
- Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13142–13153, 2023.
- ViCA-neRF: View-consistency-aware 3d editing of neural radiance fields. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Instruct-nerf2nerf: Editing 3d scenes with instructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Prompt-to-prompt image editing with cross attention control. In arXiv preprint arXiv:2208.01626, 2022.
- Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239, 2020.
- Focaldreamer: Text-driven 3d editing via focal-fusion assembly, 2023.
- Magic3d: High-resolution text-to-3d content creation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, 2023a.
- Zero-1-to-3: Zero-shot one image to 3d object, 2023b.
- Syncdreamer: Learning to generate multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023c.
- Wonder3d: Single image to 3d using cross-domain diffusion, 2023.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11461–11471, 2022.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Reference-guided controllable inpainting of neural radiance fields. In ICCV, 2023a.
- Watch your steps: Local image and scene editing by text instructions. In arXiv, 2023b.
- SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields. In CVPR, 2023c.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Autodecoding latent 3d diffusion models, 2023.
- Ed-nerf: Efficient text-guided editing of 3d scene using latent space nerf, 2023.
- Localizing object-level shape variations with text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv, 2022.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
- Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 2022.
- Texture: Text-guided texturing of 3d shapes. arXiv preprint arXiv:2305.16213, 2023.
- High-resolution image synthesis with latent diffusion models, 2021.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In arXiv preprint arxiv:2208.12242, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, 2022.
- Graf: Generative radiance fields for 3d-aware image synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Language-driven object fusion into neural radiance fields with pose-conditioned dataset updates, 2023.
- Blending-nerf: Text-driven localized editing in neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14383–14393, 2023.
- Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094, 2023a.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023b.
- Removing objects from neural radiance fields. In CVPR, 2023.
- Or-nerf: Object removing from 3d scenes guided by multiview segmentation with neural radiance fields, 2023.
- Edit-diffnerf: Editing 3d neural radiance fields using 2d diffusion model, 2023.
- Magicbrush: A manually annotated dataset for instruction-guided image editing. In Advances in Neural Information Processing Systems, 2023a.
- Hive: Harnessing human feedback for instructional visual editing. arXiv preprint arXiv:2303.09618, 2023b.
- Dreameditor: Text-driven 3d scene editing with neural fields. arXiv preprint arXiv:2306.13455, 2023.
- Mohamad Shahbazi (9 papers)
- Liesbeth Claessens (1 paper)
- Michael Niemeyer (29 papers)
- Edo Collins (5 papers)
- Alessio Tonioni (32 papers)
- Luc Van Gool (570 papers)
- Federico Tombari (214 papers)