OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields (2305.10503v3)
Abstract: The emergence of Neural Radiance Fields (NeRF) for novel view synthesis has increased interest in 3D scene editing. An essential task in editing is removing objects from a scene while ensuring visual reasonability and multiview consistency. However, current methods face challenges such as time-consuming object labeling, limited capability to remove specific targets, and compromised rendering quality after removal. This paper proposes a novel object-removing pipeline, named OR-NeRF, that can remove objects from 3D scenes with user-given points or text prompts on a single view, achieving better performance in less time than previous works. Our method spreads user annotations to all views through 3D geometry and sparse correspondence, ensuring 3D consistency with less processing burden. Then recent 2D segmentation model Segment-Anything (SAM) is applied to predict masks, and a 2D inpainting model is used to generate color supervision. Finally, our algorithm applies depth supervision and perceptual loss to maintain consistency in geometry and appearance after object removal. Experimental results demonstrate that our method achieves better editing quality with less time than previous works, considering both quality and quantity.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis,” in Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, 2020, pp. 405–421. [Online]. Available: https://doi.org/10.1007/978-3-030-58452-8_24
- Y. Peng, Y. Yan, S. Liu, Y. Cheng, S. Guan, B. Pan, G. Zhai, and X. Yang, “CageNeRF: Cage-based Neural Radiance Field for Generalized 3D Deformation and Animation,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35, 2022, pp. 31 402–31 415. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/cb78e6b5246b03e0b82b4acc8b11cc21-Paper-Conference.pdf
- T. Xu and T. Harada, “Deforming Radiance Fields with Cages,” in Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, 2022, pp. 159–175. [Online]. Available: https://doi.org/10.1007/978-3-031-19827-4_10
- Y.-J. Yuan, Y.-T. Sun, Y.-K. Lai, Y. Ma, R. Jia, and L. Gao, “NeRF-Editing: Geometry Editing of Neural Radiance Fields,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 18 332–18 343.
- F. Xiang, Z. Xu, M. Hašan, Y. Hold-Geoffroy, K. Sunkavalli, and H. Su, “NeuTex: Neural Texture Mapping for Volumetric Neural Rendering,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7115–7124.
- B. Yang, C. Bao, J. Zeng, H. Bao, Y. Zhang, Z. Cui, and G. Zhang, “NeuMesh: Learning Disentangled Neural Mesh-Based Implicit Field for Geometry and Texture Editing,” in Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI, 2022, pp. 597–614. [Online]. Available: https://doi.org/10.1007/978-3-031-19787-1_34
- B. Yang, Y. Zhang, Y. Xu, Y. Li, H. Zhou, H. Bao, G. Zhang, and Z. Cui, “Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 759–13 768.
- Q. Wu, X. Liu, Y. Chen, K. Li, C. Zheng, J. Cai, and J. Zheng, “Object-Compositional Neural Implicit Surfaces,” in Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII, 2022, pp. 197–213. [Online]. Available: https://doi.org/10.1007/978-3-031-19812-0_12
- S. Weder, G. Garcia-Hernando, A. Monszpart, M. Pollefeys, G. Brostow, M. Firman, and S. Vicente, “Removing Objects From Neural Radiance Fields,” 2022. [Online]. Available: http://arxiv.org/abs/2212.11966
- A. Mirzaei, T. Aumentado-Armstrong, K. G. Derpanis, J. Kelly, M. A. Brubaker, I. Gilitschenski, and A. Levinshtein, “SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields,” 2023. [Online]. Available: http://arxiv.org/abs/2211.12254
- R. Goel, D. Sirikonda, S. Saini, and P. J. Narayanan, “Interactive Segmentation of Radiance Fields,” 2023. [Online]. Available: http://arxiv.org/abs/2212.13545
- R. Suvorov, E. Logacheva, A. Mashikhin, A. Remizova, A. Ashukha, A. Silvestrov, N. Kong, H. Goka, K. Park, and V. Lempitsky, “Resolution-robust Large Mask Inpainting with Fourier Convolutions,” in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 3172–3182.
- Y. Hao, Y. Liu, Z. Wu, L. Han, Y. Chen, G. Chen, L. Chu, S. Tang, Z. Yu, Z. Chen, and B. Lai, “EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow,” in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 1551–1560.
- M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging Properties in Self-Supervised Vision Transformers,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9630–9640.
- T. Zhou, F. Porikli, D. J. Crandall, L. Van Gool, and W. Wang, “A survey on deep learning technique for video segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7099–7122, 2023.
- S. Zhi, T. Laidlow, S. Leutenegger, and A. J. Davison, “In-Place Scene Labelling and Understanding with Implicit Scene Representation,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 818–15 827.
- S. Kobayashi, E. Matsumoto, and V. Sitzmann, “Decomposing NeRF for Editing via Feature Field Distillation,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35, 2022, pp. 23 311–23 330. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/93f250215e4889119807b6fac3a57aec-Paper-Conference.pdf
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning Transferable Visual Models From Natural Language Supervision,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 8748–8763. [Online]. Available: https://proceedings.mlr.press/v139/radford21a.html
- J. L. Schönberger, T. Price, T. Sattler, J.-M. Frahm, and M. Pollefeys, “A vote-and-verify strategy for fast spatial verification in image retrieval,” in Asian Conference on Computer Vision (ACCV), 2016.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment Anything,” 2023. [Online]. Available: http://arxiv.org/abs/2304.02643
- A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “TensoRF: Tensorial Radiance Fields,” in Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, 2022, pp. 333–350. [Online]. Available: https://doi.org/10.1007/978-3-031-19824-3_20
- Z. Qiu, T. Yao, and T. Mei, “Learning deep spatio-temporal dependence for semantic video segmentation,” IEEE Transactions on Multimedia, vol. 20, no. 4, pp. 939–949, 2018.
- L. Wang and C. Jung, “Example-based video stereolization with foreground segmentation and depth propagation,” IEEE Transactions on Multimedia, vol. 16, no. 7, pp. 1905–1914, 2014.
- L. Zhao, H. Zhou, X. Zhu, X. Song, H. Li, and W. Tao, “Lif-seg: Lidar and camera image fusion for 3d lidar semantic segmentation,” IEEE Transactions on Multimedia, pp. 1–11, 2023.
- A. H. Abdulnabi, B. Shuai, Z. Zuo, L.-P. Chau, and G. Wang, “Multimodal recurrent neural networks with information transfer layers for indoor scene labeling,” IEEE Transactions on Multimedia, vol. 20, no. 7, pp. 1656–1671, 2018.
- Z. Fan, P. Wang, Y. Jiang, X. Gong, D. Xu, and Z. Wang, “NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes,” 2022. [Online]. Available: http://arxiv.org/abs/2209.08776
- X. Liu, J. Chen, H. Yu, Y.-W. Tai, and C.-K. Tang, “Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35, 2022, pp. 17 730–17 743. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/70de9e3948645a1be2de657f14d85c6d-Paper-Conference.pdf
- M. Wallingford, A. Kusupati, A. Fang, V. Ramanujan, A. Kembhavi, R. Mottaghi, and A. Farhadi, “Neural Radiance Field Codebooks,” in ICLR, 2023. [Online]. Available: https://openreview.net/forum?id=mX56bKDybu5
- X. Fu, S. Zhang, T. Chen, Y. Lu, L. Zhu, X. Zhou, A. Geiger, and Y. Liao, “Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation,” 2022. [Online]. Available: http://arxiv.org/abs/2203.15224
- C. Bao, Y. Zhang, B. Yang, T. Fan, Z. Yang, H. Bao, G. Zhang, and Z. Cui, “SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field,” 2023. [Online]. Available: http://arxiv.org/abs/2303.13277
- S. Benaim, F. Warburg, P. E. Christensen, and S. Belongie, “Volumetric Disentanglement for 3D Scene Manipulation,” 2022. [Online]. Available: http://arxiv.org/abs/2206.02776
- A. Mikaeili, O. Perel, D. Cohen-Or, and A. Mahdavi-Amiri, “SKED: Sketch-guided Text-based 3D Editing,” 2023. [Online]. Available: http://arxiv.org/abs/2303.10735
- E. Sella, G. Fiebelman, P. Hedman, and H. Averbuch-Elor, “Vox-E: Text-guided Voxel Editing of 3D Objects,” 2023. [Online]. Available: http://arxiv.org/abs/2303.12048
- Z. Chen, K. Yin, and S. Fidler, “AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1455–1464.
- K. Rematas, R. Martin-Brualla, and V. Ferrari, “Sharf: Shape-conditioned Radiance Fields from a Single View,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 8948–8958. [Online]. Available: https://proceedings.mlr.press/v139/rematas21a.html
- H.-X. Yu, L. Guibas, and J. Wu, “Unsupervised Discovery of Object Radiance Fields,” in ICLR, 2022. [Online]. Available: https://openreview.net/forum?id=rwE8SshAlxw
- H.-K. Liu, I.-C. Shen, and B.-Y. Chen, “NeRF-In: Free-Form NeRF Inpainting with RGB-D Priors,” 2022. [Online]. Available: http://arxiv.org/abs/2206.04901
- V. Lazova, V. Guzov, K. Olszewski, S. Tulyakov, and G. Pons-Moll, “Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation,” 2022. [Online]. Available: http://arxiv.org/abs/2204.10850
- B. Wang, L. Chen, and B. Yang, “DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images,” in ICLR, 2023. [Online]. Available: https://openreview.net/forum?id=C_PRLz8bEJx
- S. Liu, X. Zhang, Z. Zhang, R. Zhang, J.-Y. Zhu, and B. Russell, “Editing Conditional Radiance Fields,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 5753–5763.
- J. Zhu, Y. Huo, Q. Ye, F. Luan, J. Li, D. Xi, L. Wang, R. Tang, W. Hua, H. Bao, and R. Wang, “I$2̂$-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs,” 2023. [Online]. Available: http://arxiv.org/abs/2303.07634
- W. Ye, S. Chen, C. Bao, H. Bao, M. Pollefeys, Z. Cui, and G. Zhang, “IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis,” 2023. [Online]. Available: http://arxiv.org/abs/2210.00647
- A. Mirzaei, Y. Kant, J. Kelly, and I. Gilitschenski, “LaTeRF: Label and Text Driven Object Radiance Fields,” in Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, 2022, pp. 20–36. [Online]. Available: https://doi.org/10.1007/978-3-031-20062-5_2
- Z. Kuang, F. Luan, S. Bi, Z. Shu, G. Wetzstein, and K. Sunkavalli, “PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields,” 2023. [Online]. Available: http://arxiv.org/abs/2212.10699
- B. Li, K. Q. Weinberger, S. Belongie, V. Koltun, and R. Ranftl, “Language-driven Semantic Segmentation,” in ICLR, 2022. [Online]. Available: https://openreview.net/forum?id=RriDjddCLN
- V. Tschernezki, I. Laina, D. Larlus, and A. Vedaldi, “Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations,” in 2022 International Conference on 3D Vision (3DV), 2022, pp. 443–453.
- K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised NeRF: Fewer Views and Faster Training for Free,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 12 872–12 881.
- J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., 2016, pp. 694–711.
- IDEA-Research, “Grounded-sam,” https://github.com/IDEA-Research/Grounded-Segment-Anything, 2023.
- S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
- Q. Wang, Z. Wang, K. Genova, P. Srinivasan, H. Zhou, J. T. Barron, R. Martin-Brualla, N. Snavely, and T. Funkhouser, “Ibrnet: Learning multi-view image-based rendering,” in CVPR, 2021.
- B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar, “Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,” ACM Transactions on Graphics (TOG), 2019.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
- M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local nash equilibrium,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17, 2017, pp. 6629–6640.
- L. Zhang and M. Agrawala, “Adding Conditional Control to Text-to-Image Diffusion Models,” 2023. [Online]. Available: http://arxiv.org/abs/2302.05543
- A. Haque, M. Tancik, A. A. Efros, A. Holynski, and A. Kanazawa, “Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions,” 2023. [Online]. Available: http://arxiv.org/abs/2303.12789
- A. Raj, S. Kaza, B. Poole, M. Niemeyer, N. Ruiz, B. Mildenhall, S. Zada, K. Aberman, M. Rubinstein, J. Barron, Y. Li, and V. Jampani, “DreamBooth3D: Subject-Driven Text-to-3D Generation,” 2023. [Online]. Available: http://arxiv.org/abs/2303.13508
- B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “DreamFusion: Text-to-3D using 2D Diffusion,” in ICLR, 2023. [Online]. Available: https://openreview.net/forum?id=FjNys5c7VyY
- Z. Zhou and S. Tulsiani, “SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction,” 2023. [Online]. Available: http://arxiv.org/abs/2212.00792
- U. Singer, S. Sheynin, A. Polyak, O. Ashual, I. Makarov, F. Kokkinos, N. Goyal, A. Vedaldi, D. Parikh, J. Johnson, and Y. Taigman, “Text-To-4D Dynamic Scene Generation,” 2023. [Online]. Available: http://arxiv.org/abs/2301.11280
- S. Cao, W. Chai, S. Hao, Y. Zhang, H. Chen, and G. Wang, “Difffashion: Reference-based fashion design with structure-aware transfer by diffusion models,” IEEE Transactions on Multimedia, pp. 1–13, 2023.
- Youtan Yin (3 papers)
- Zhoujie Fu (5 papers)
- Fan Yang (878 papers)
- Guosheng Lin (157 papers)