Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting (2311.14521v4)

Published 24 Nov 2023 in cs.CV

Abstract: 3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation. GaussianEditor enhances precision and control in editing through our proposed Gaussian semantic tracing, which traces the editing target throughout the training process. Additionally, we propose Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. Project Page: https://buaacyw.github.io/gaussian-editor/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Sine: Semantic-driven image-based nerf editing with prior-guided editing field. In CVPR 2023, pages 20919–20929, 2023.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  3. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022.
  4. Instructpix2pix: Learning to follow image editing instructions. arXiv preprint arXiv:2211.09800, 2022.
  5. It3d: Improved text-to-3d generation with explicit view synthesis. arXiv preprint arXiv:2308.11473, 2023a.
  6. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. arXiv preprint arXiv:2208.00277, 2022.
  7. Text-to-3d using gaussian splatting. arXiv preprint arXiv:2309.16585, 2023b.
  8. Progressive3d: Progressively local editing for text-to-3d content creation with complex semantic prompts. arXiv preprint arXiv:2310.11784, 2023.
  9. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
  10. Textdeformer: Geometry manipulation using text guidance. arXiv preprint arXiv:2304.13348, 2023.
  11. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  12. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789, 2023.
  13. Baking neural radiance fields for real-time view synthesis. ICCV, 2021.
  14. Delta denoising score. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2328–2337, 2023.
  15. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023a.
  16. 3d gaussian splatting for real-time radiance field rendering. ToG, 42(4):1–14, 2023b.
  17. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023c.
  18. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  19. Decomposing nerf for editing via feature field distillation. arXiv preprint arXiv:2205.15585, 2022.
  20. Climatenerf: Physically-based neural rendering for extreme climate synthesis. arXiv e-prints, pages arXiv–2211, 2022.
  21. Neuralangelo: High-fidelity neural surface reconstruction. In CVPR, 2023.
  22. Magic3d: High-resolution text-to-3d content creation. In CVPR, pages 300–309, 2023.
  23. Nerf-in: Free-form nerf inpainting with rgb-d priors. arXiv preprint arXiv:2206.04901, 2022.
  24. Editing conditional radiance fields. In ICCV 2021, pages 5773–5783, 2021.
  25. Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
  26. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713, 2023.
  27. Sked: Sketch-guided text-based 3d editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14607–14619, 2023.
  28. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  29. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  30. Neural articulated radiance field. In ICCV 2021, pages 5762–5772, 2021.
  31. Ed-nerf: Efficient text-guided editing of 3d scene using latent space nerf. arXiv preprint arXiv:2310.02712, 2023.
  32. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021.
  33. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR 2021, pages 9054–9063, 2021.
  34. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
  35. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  36. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  37. Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508, 2023.
  38. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021.
  39. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  40. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
  41. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
  42. Vox-e: Text-guided voxel editing of 3d objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 430–440, 2023.
  43. Control4d: Dynamic portrait editing by learning 4d gan from 2d diffusion-based editor. arXiv preprint arXiv:2305.20082, 2023.
  44. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  45. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3835–3844, 2022.
  46. Nerf-art: Text-driven neural radiance fields stylization. IEEE Transactions on Visualization and Computer Graphics, 2023.
  47. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  48. Deforming radiance fields with cages. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, pages 159–175. Springer, 2022.
  49. Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In ECCV 2022, pages 597–614. Springer, 2022.
  50. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023a.
  51. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv:2310.10642, 2023b.
  52. Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. arXiv preprint arXiv:2310.08529, 2023.
  53. Differentiable surface splatting for point-based geometry processing. ACM Transactions on Graphics (TOG), 38(6):1–14, 2019.
  54. Nerf-editing: geometry editing of neural radiance fields. In CVPR 2022, pages 18353–18364, 2022.
  55. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
  56. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.
  57. Dreameditor: Text-driven 3d scene editing with neural fields. arXiv preprint arXiv:2306.13455, 2023.
  58. Surface splatting. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 371–378, 2001.
Citations (126)

Summary

  • The paper introduces GaussianEditor, a novel method that swiftly edits 3D scenes using text prompts and explicit controls.
  • It utilizes Hierarchical Gaussian Splatting and semantic tracing to achieve precise, targeted modifications in real time.
  • Experimental results show that editing times are reduced to 5–10 minutes, enhancing both efficiency and accuracy compared to traditional methods.

Background on 3D Editing

3D editing is a key component in numerous fields such as gaming and virtual reality. Traditionally, this task has been accomplished using methods that provide interactivity and control over objects, such as meshes and point clouds. However, these approaches often struggle to realistically depict complex scenes. On the other hand, implicit 3D representations like Neural Radiance Fields (NeRF) essentially changed the game by rendering intricate scenes effectively but are known for being computationally intensive, particularly in terms of editing and processing speed.

Introducing GaussianEditor

To overcome the limitations of existing 3D editing methods, a new algorithm named GaussianEditor has been developed. This innovation is designed to perform 3D editing tasks swiftly and with a high degree of control. It builds upon a concept known as Gaussian Splatting (GS), an explicit 3D representation allowing for real-time rendering and editable point cloud-like structures. GaussianEditor makes the process of editing 3D scenes more flexible and rapid by integrating text-based editing with explicit control methods such as the use of bounding boxes for specific area modifications.

Precision and Control in Editing

A cornerstone of the GaussianEditor algorithm is the introduction of "Gaussian semantic tracing." This approach improves editing precision by tracing the changes throughout the training process, which is in stark contrast to the static masks used in traditional methods. With Gaussian semantic tracing, each Gaussian point in the 3D model is tagged semantically, allowing the system to only modify targeted areas with high accuracy.

Further refining the method, "Hierarchical Gaussian Splatting" (HGS) is proposed to ensure that the GS adapts to a wider range of editing scenarios. In HGS, Gaussians are organized into generations and application of stricter constraints on older generations, thus optimizing the balance between detail orientation and adaptability.

Streamlining 3D Inpainting

An additional contribution from the research is a specialized 3D inpainting process for GS, which includes both the removal and integration of objects in a scene. For instance, the elimination of an object is aided by a local repair algorithm that removes artifacts at the interface between the object and the scene quickly and efficiently. For object addition, users provide a text prompt and a 2D inpainting mask from a particular view. An image of the object to be added is generated and then transformed into 3D Gaussians which are finally integrated into the original scene.

Performance and Efficacy

Experiments with GaussianEditor demonstrate its superior control, effectiveness, and speed in editing, marking a significant advance in the area of 3D scene manipulation. Additionally, the entire editing process is drastically faster, usually taking between 5 to 10 minutes, which is a notable improvement over previous methods.

Conclusion

The development of GaussianEditor is a step forward for efficient and controlled 3D scene editing using Gaussian Splatting. By incorporating semantic tracing and HGS, alongside the specialized 3D inpainting process, GaussianEditor fulfills high-quality editing needs within minutes, significantly enhancing the controllability and practicality of 3D editing.

Github Logo Streamline Icon: https://streamlinehq.com