DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing (2404.18929v3)

Published 29 Apr 2024 in cs.CV

Abstract: We consider the problem of editing 3D objects and scenes based on open-ended language instructions. A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process, obviating the need for 3D data. However, this process is often inefficient due to the need for iterative updates of costly 3D representations, such as neural radiance fields, either through individual view edits or score distillation sampling. A major disadvantage of this approach is the slow convergence caused by aggregating inconsistent information across views, as the guidance from 2D models is not multi-view consistent. We thus introduce the Direct Gaussian Editor (DGE), a method that addresses these issues in two stages. First, we modify a given high-quality image editor like InstructPix2Pix to be multi-view consistent. To do so, we propose a training-free approach that integrates cues from the 3D geometry of the underlying scene. Second, given a multi-view consistent edited sequence of images, we directly and efficiently optimize the 3D representation, which is based on 3D Gaussian Splatting. Because it avoids incremental and iterative edits, DGE is significantly more accurate and efficient than existing approaches and offers additional benefits, such as enabling selective editing of parts of the scene.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces DGE, a novel method that achieves multi-view consistent 3D editing by modifying existing image editors with 3D geometry cues.
The paper improves efficiency by directly optimizing 3D representations using 3D Gaussian Splatting, eliminating costly iterative updates.
The paper enables selective editing of specific scene sections, ensuring precise control and coherent results across multiple views.

The paper "DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing" addresses the challenge of 3D object and scene editing based on open-ended language instructions. Traditional approaches in this area typically rely on 2D image generators or editors to guide the 3D editing process. However, such methods face significant efficiency issues, primarily due to the need for updating computationally expensive 3D representations like neural radiance fields. Furthermore, these methods often struggle with multi-view consistency, as the 2D models guiding the edits do not inherently support consistent editing across different viewpoints.

To overcome these limitations, the authors introduce the Direct Gaussian Editor (DGE). This innovative method improves the 3D editing process through two key strategies:

Multi-View Consistency in Image Editing: The first strategy involves modifying existing high-quality image editors, such as InstructPix2Pix, to ensure multi-view consistency. Instead of relying on standard training methods, the authors propose a novel training-free approach. This approach leverages the underlying 3D geometry cues of the scene, enabling edited images to remain consistent across different views of the 3D object. This ensures that edits made to one view of the object are accurately reflected in all other views, thereby maintaining visual coherence.
Efficient 3D Object Representation Optimization: The second strategy focuses on optimizing the 3D object representation efficiently. Once a sequence of multi-view consistent edited images is obtained, the authors utilize 3D Gaussian Splatting to directly optimize the 3D object representation. This method avoids the iterative and incremental application of edits, which is a bottleneck in traditional approaches. By directly optimizing the 3D representation, DGE achieves significant efficiency gains, making the editing process faster and more effective.

Additionally, DGE introduces the capability for selective editing of specific parts of the scene. This nuanced editing allows users to modify only desired sections of the 3D object or scene without affecting the rest, providing greater control and precision.

In summary, DGE presents a significant advancement in the field of 3D editing by addressing the challenges of efficiency and multi-view consistency. The integration of high-quality image editors with underlying 3D geometry cues and the use of 3D Gaussian Splatting for direct optimization are key innovations that make DGE a promising tool for 3D object and scene editing based on open-ended language instructions.

PDF Markdown

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1785382759088685174

https://twitter.com/_vztu/status/1816231016010571844