Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Model (2306.09551v1)
Abstract: Recent research has demonstrated that the combination of pretrained diffusion models with neural radiance fields (NeRFs) has emerged as a promising approach for text-to-3D generation. Simply coupling NeRF with diffusion models will result in cross-view inconsistency and degradation of stylized view syntheses. To address this challenge, we propose the Edit-DiffNeRF framework, which is composed of a frozen diffusion model, a proposed delta module to edit the latent semantic space of the diffusion model, and a NeRF. Instead of training the entire diffusion for each scene, our method focuses on editing the latent semantic space in frozen pretrained diffusion models by the delta module. This fundamental change to the standard diffusion framework enables us to make fine-grained modifications to the rendered views and effectively consolidate these instructions in a 3D scene via NeRF training. As a result, we are able to produce an edited 3D scene that faithfully aligns to input text instructions. Furthermore, to ensure semantic consistency across different viewpoints, we propose a novel multi-view semantic consistency loss that extracts a latent semantic embedding from the input view as a prior, and aim to reconstruct it in different views. Our proposed method has been shown to effectively edit real-world 3D scenes, resulting in 25% improvement in the alignment of the performed 3D edits with text instructions compared to prior work.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–15, 2023.
- CARLA: an open urban driving simulator. In CoRL, pages 1–16, 2017.
- Instruct-NeRF2NeRF: Editing 3d scenes with instructions. arXiv preprint 2303.12789, 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), pages 1–38, 2018.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), pages 1–12, 2020.
- Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 857–866, 2022.
- HOLODIFFUSION: training a 3d diffusion model using 2d images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–13, 2023.
- CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH, pages 25:1–25:8, 2022.
- Editing conditional radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5753–5763, 2021.
- NeRF: Representing scenes as neural radiance fields for view synthesis. In ECCV, pages 405–421, 2020.
- DiffRF: Rendering-guided 3d radiance field diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–11, 2023.
- Extracting triangular 3d models, materials, and lighting from images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Neural scene graphs for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2856–2865, 2021.
- Photoshape: photorealistic materials for large-scale shape collections. ACM Trans. Graph., 37(6):192, 2018.
- DreamFusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
- Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), pages 8748–8763, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2022.
- Structure-from-motion revisited. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016.
- GRAF: Generative radiance fields for 3d-aware image synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Let 2d diffusion model know 3d-consistency for robust text-to-3d generation. arXiv preprint arXiv:2303.07937, 2023.
- Exploring compositional visual generation with latent classifier guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–10, 2023.
- NeRV: Neural reflectance and visibility fields for relighting and view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7495–7504, 2021.
- Nerfstudio: A modular framework for neural radiance field development. arXiv preprint arXiv:2302.04264, 2023.
- Deformation-aware 3d model embedding and retrieval. In ECCV, pages 397–413, 2020.
- Ref-NeRF: Structured view-dependent appearance for neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5481–5490, 2022.
- CLIP-NeRF: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3825–3834, 2022.
- Pixel2Mesh: 3d mesh model generation via image guided deformation. IEEE Trans. Pattern Anal. Mach. Intell., 43(10):3600–3613, 2021.
- Learning object-compositional neural radiance field for editable scene rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13759–13768, 2021.
- Unsupervised discovery of object radiance fields. In ICLR, 2022.
- NeRF-editing: Geometry editing of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18332–18343, 2022.
- Unsupervised representation learning from pre-trained diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- SparseFusion: Distilling view-conditioned diffusion for 3d reconstruction. arXiv preprint arXiv:2212.00792, 2022.