Overview of Neural Radiance Fields and Scene Editing
Neural Radiance Fields (NeRF) have become a critical tool for creating realistic 3D scenes that can be viewed from any angle. This technology uses neural networks to reproduce the complex behaviors of light within a scene, allowing for photorealistic renderings of virtual environments. Advances in NeRF have spurred research into 3D scene editing, where objects within a scene can be textured, styled, or replaced to suit different needs. However, editing 3D scenes directly with approaches like NeRF can be challenging due to the need for accurate manipulation of specific areas, known as foreground regions, and ensuring consistency across different viewpoints.
Adaptive Source Driven 3D Scene Editing
The paper introduces a solution for customized 3D scene editing by incorporating adaptive source input, either in the form of text descriptions or reference images. This allows for the modification of a scene's foreground while keeping its background unchanged, tackling a common difficulty in prior work where changes could inadvertently affect untargeted parts of the scene.
Local-Global Iterative Editing
To overcome the challenge of concentrating edits on the foreground, the authors propose a Local-Global Iterative Editing (LGIE) training scheme. In this scheme, the editing alternates between local stages, focusing on the foreground, and global stages, taking into account the entire scene. This process is facilitated by developing a foreground-aware NeRF that can discern which parts of the scene should be edited. By adjusting the training process to focus on the foreground or the whole scene as needed, the method manages to preserve the original layout and background details.
Class-guided Regularization for Image-driven Editing
Another challenge arises when editing is guided by a single-view reference image, which can lead to inconsistencies when rendering from different perspectives. The authors address this with a class-guided regularization technique, using a Text-to-Image (T2I) model to encode the visual subject from the reference image into a textual prompt. During the editing process, this allows general class priors from the T2I model to guide geometric consistency across views.
Results and Conclusions
The model, named CustomNeRF, is shown to produce precise editing results in various real scenes for both text- and image-driven settings. Extensive experiments reveal that CustomNeRF can effectively modify the specified regions in a photo-realistic manner, demonstrating the potential of LGIE and class-guided regularization in 3D scene editing. The model's contributions offer a significant step in enabling users to customize scenes according to their specific needs or preferences, broadening the accessibility and flexibility of NeRF-based editing tools.