Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LatentEditor: Text Driven Local Editing of 3D Scenes (2312.09313v4)

Published 14 Dec 2023 in cs.CV and cs.AI

Abstract: While neural fields have made significant strides in view synthesis and scene reconstruction, editing them poses a formidable challenge due to their implicit encoding of geometry and texture information from multi-view inputs. In this paper, we introduce \textsc{LatentEditor}, an innovative framework designed to empower users with the ability to perform precise and locally controlled editing of neural fields using text prompts. Leveraging denoising diffusion models, we successfully embed real-world scenes into the latent space, resulting in a faster and more adaptable NeRF backbone for editing compared to traditional methods. To enhance editing precision, we introduce a delta score to calculate the 2D mask in the latent space that serves as a guide for local modifications while preserving irrelevant regions. Our novel pixel-level scoring approach harnesses the power of InstructPix2Pix (IP2P) to discern the disparity between IP2P conditional and unconditional noise predictions in the latent space. The edited latents conditioned on the 2D masks are then iteratively updated in the training set to achieve 3D local editing. Our approach achieves faster editing speeds and superior output quality compared to existing 3D editing models, bridging the gap between textual instructions and high-quality 3D scene editing in latent space. We show the superiority of our approach on four benchmark 3D datasets, LLFF, IN2N, NeRFStudio and NeRF-Art. Project Page: https://latenteditor.github.io/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Blended diffusion for text-driven editing of natural images. In CVPR 2022, pages 18208–18218, 2022.
  2. Instructpix2pix: Learning to follow image editing instructions. arXiv preprint arXiv:2211.09800, 2022.
  3. Animatable neural radiance fields from monocular rgb videos. arXiv preprint arXiv:2106.13629, 2021.
  4. Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534, 2022.
  5. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427, 2022.
  6. Stylegan-nada: Clip-guided domain adaptation of image generators. arXiv preprint arXiv:2108.00946, 2021.
  7. Blended-nerf: Zero-shot object generation and blending in existing neural radiance fields. arXiv preprint arXiv:2306.12760, 2023.
  8. Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. arXiv preprint arXiv:2110.08985, 2021.
  9. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789, 2023.
  10. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  11. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  12. Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
  13. Imagic: Text-based real image editing with diffusion models. arXiv preprint arXiv:2210.09276, 2022.
  14. Cefhri: A communication efficient federated learning framework for recognizing industrial human-robot interaction. arXiv preprint arXiv:2308.14965, 2023.
  15. 3d-aware blending with generative nerfs. arXiv preprint arXiv:2302.06608, 2023.
  16. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  17. Palettenerf: Palette-based appearance editing of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20691–20700, 2023.
  18. Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10965–10975, 2022.
  19. Nerf-in: Free-form nerf inpainting with rgb-d priors. arXiv preprint arXiv:2206.04901, 2022.
  20. Neural sparse voxel fields. NeurIPS 2020, 33:15651–15663, 2020.
  21. Zero-1-to-3: Zero-shot one image to 3d object, 2023.
  22. Editing conditional radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5773–5783, 2021.
  23. Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7086–7096, 2022.
  24. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  25. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  26. Latent-nerf for shape-guided generation of 3d shapes and textures. arXiv preprint arXiv:2211.07600, 2022.
  27. Sked: Sketch-guided text-based 3d editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14607–14619, 2023.
  28. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019.
  29. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  30. Watch your steps: Local image and scene editing by text instructions. arXiv preprint arXiv:2308.08947, 2023.
  31. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  32. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  33. Ed-nerf: Efficient text-guided editing of 3d scene using latent space nerf. arXiv preprint arXiv:2310.02712, 2023.
  34. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  35. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  36. Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508, 2023.
  37. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  38. High-resolution image synthesis with latent diffusion models. In CVPR 2022, pages 10684–10695, 2022.
  39. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS 2022, 35:36479–36494, 2022.
  40. Vox-e: Text-guided voxel editing of 3d objects. arXiv preprint arXiv:2303.12048, 2023.
  41. Control4d: Dynamic portrait editing by learning 4d gan from 2d diffusion-based editor. arXiv preprint arXiv:2305.20082, 2023.
  42. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  43. Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–12, 2023.
  44. Residual adapters for parameter-efficient asr adaptation to atypical and accented speech. arXiv preprint arXiv:2109.06952, 2021.
  45. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  46. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In CVPR 2022, pages 3835–3844, 2022a.
  47. Nerf-sr: High quality neural radiance fields using supersampling. In ACM MM 2022, pages 6445–6454, 2022b.
  48. Nerf-art: Text-driven neural radiance fields stylization. IEEE Transactions on Visualization and Computer Graphics, 2023.
  49. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
  50. Neutex: Neural texture mapping for volumetric neural rendering. In CVPR 2021, pages 7119–7128, 2021.
  51. Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In ECCV 2022, pages 597–614. Springer, 2022.
  52. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
  53. Arf: Artistic radiance fields. In European Conference on Computer Vision, pages 717–733. Springer, 2022.
  54. Dreameditor: Text-driven 3d scene editing with neural fields. arXiv preprint arXiv:2306.13455, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Umar Khalid (18 papers)
  2. Hasan Iqbal (19 papers)
  3. Nazmul Karim (21 papers)
  4. Jing Hua (15 papers)
  5. Chen Chen (753 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.