DreamEditor: Text-Driven 3D Scene Editing with Neural Fields (2306.13455v3)

Published 23 Jun 2023 in cs.CV

Abstract: Neural fields have achieved impressive advancements in view synthesis and scene reconstruction. However, editing these neural fields remains challenging due to the implicit encoding of geometry and texture information. In this paper, we propose DreamEditor, a novel framework that enables users to perform controlled editing of neural fields using text prompts. By representing scenes as mesh-based neural fields, DreamEditor allows localized editing within specific regions. DreamEditor utilizes the text encoder of a pretrained text-to-Image diffusion model to automatically identify the regions to be edited based on the semantics of the text prompts. Subsequently, DreamEditor optimizes the editing region and aligns its geometry and texture with the text prompts through score distillation sampling [29]. Extensive experiments have demonstrated that DreamEditor can accurately edit neural fields of real-world scenes according to the given text prompts while ensuring consistency in irrelevant areas. DreamEditor generates highly realistic textures and geometry, significantly surpassing previous works in both quantitative and qualitative evaluations.

Authors (5)

Jingyu Zhuang (6 papers)
Chen Wang (600 papers)
Lingjie Liu (79 papers)
Liang Lin (318 papers)
Guanbin Li (177 papers)

Citations (102)

View on Semantic Scholar

Summary

DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

The paper "DreamEditor: Text-Driven 3D Scene Editing with Neural Fields" presents a novel framework aimed at enhancing 3D scene editing by integrating text-driven modifications within neural fields. Traditional techniques for 3D modeling and editing often stumble when handling neural fields due to the implicit nature of geometry and texture information. DreamEditor offers a solution by enabling intuitive and precise 3D scene editing through text prompts. This is achieved through the use of advanced techniques in neural fields and text-to-image diffusion models.

The DreamEditor framework is comprised of several key components that collectively facilitate the editing of complex scenes. The approach involves initially representing scenes as mesh-based neural fields, allowing for localized edits by mapping specific text prompts to designated regions. The framework utilizes a pretrained diffusion model to interpret text prompts and identify regions within the neural field for editing, using score distillation sampling to ensure accurate alignment of geometry and texture within these regions.

One of the most significant aspects of this paper is its use of a mesh-based neural field representation. This approach not only supports more precise editing by defining explicit structures within the implicit neural field but also decouples geometry and texture, thus avoiding unnecessary deformations when only appearance changes are needed. The mesh structure also enables efficient conversion from 2D text-driven edits to 3D scene modifications.

The performance of DreamEditor is demonstrated through extensive qualitative and quantitative experimental results. It consistently outperforms existing methods, offering superior detail and accuracy in editing while maintaining scene consistency in parts of the image that are not affected by text prompts. A broad suite of tests using both synthetic and real-world scenes indicates the robustness and versatility of the approach across various objects and environments, including animal figures, human faces, and realistic outdoor landscapes.

Critically, DreamEditor's ability to finely tune and improve fidelity in neural fields through simple text-based directives holds notable implications for both research and practical applications in computer graphics and virtual reality. It opens up pathways for non-experts to engage in complex 3D editing tasks, significantly lowering entry barriers to high-fidelity 3D modeling. At the same time, it pushes forward theoretical explorations into the capabilities and limitations of neural representations in dynamic and interactive contexts.

The trajectory of this research suggests several potential future developments. Further refinement of the fidelity in highly occluded and complex scenes may draw the focus of upcoming research efforts. Additionally, exploring domain adaptation techniques to enhance DreamEditor's efficacy across diverse neural field paradigms could prove beneficial. Furthermore, as text-to-image and text-to-3D models continue to evolve, DreamEditor’s underlying algorithms may be adapted to leverage these advancements, further expanding its editing scope and efficiency.

Overall, the introduction of DreamEditor marks a considerable step forward in the domain of 3D scene editing, marrying the intuitive input of textual descriptions with the nuanced technical requirements of neural fields to coolly and efficiently produce realistic and high-quality 3D edits. The practical and theoretical contributions of this work hold substantial promise for enhancing user interaction with complex digital visual environments.

PDF Markdown

Related Papers

YouTube

Show All Videos