Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Localized Gaussian Splatting Editing with Contextual Awareness (2408.00083v1)

Published 31 Jul 2024 in cs.CV

Abstract: Recent text-guided generation of individual 3D object has achieved great success using diffusion priors. However, these methods are not suitable for object insertion and replacement tasks as they do not consider the background, leading to illumination mismatches within the environment. To bridge the gap, we introduce an illumination-aware 3D scene editing pipeline for 3D Gaussian Splatting (3DGS) representation. Our key observation is that inpainting by the state-of-the-art conditional 2D diffusion model is consistent with background in lighting. To leverage the prior knowledge from the well-trained diffusion models for 3D object generation, our approach employs a coarse-to-fine objection optimization pipeline with inpainted views. In the first coarse step, we achieve image-to-3D lifting given an ideal inpainted view. The process employs 3D-aware diffusion prior from a view-conditioned diffusion model, which preserves illumination present in the conditioning image. To acquire an ideal inpainted image, we introduce an Anchor View Proposal (AVP) algorithm to find a single view that best represents the scene illumination in target region. In the second Texture Enhancement step, we introduce a novel Depth-guided Inpainting Score Distillation Sampling (DI-SDS), which enhances geometry and texture details with the inpainting diffusion prior, beyond the scope of the 3D-aware diffusion prior knowledge in the first coarse step. DI-SDS not only provides fine-grained texture enhancement, but also urges optimization to respect scene lighting. Our approach efficiently achieves local editing with global illumination consistency without explicitly modeling light transport. We demonstrate robustness of our method by evaluating editing in real scenes containing explicit highlight and shadows, and compare against the state-of-the-art text-to-3D editing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Sine: Semantic-driven image-based nerf editing with prior-guided editing field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20919–20929, 2023.
  2. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022.
  3. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18392–18402, June 2023.
  4. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  5. Text2tex: Text-driven texture synthesis via diffusion models. arXiv preprint arXiv:2303.11396, 2023.
  6. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting. In CVPR, 2024.
  7. Text-to-3d using gaussian splatting. In IEEE Conference on Computer Vision and Pattern Recognition, 2024.
  8. Deformed implicit field: Modeling 3d shapes with learned dense correspondence. In IEEE Computer Vision and Pattern Recognition, 2021.
  9. ViCA-neRF: View-consistency-aware 3d editing of neural radiance fields. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  10. Unified implicit neural stylization. In European Conference on Computer Vision, 2022.
  11. Gaussianeditor: Editing 3d gaussians delicately with text instructions. In CVPR, 2024.
  12. Blended-nerf: Zero-shot object generation and blending in existing neural radiance fields. In International Conference on Computer Vision AI3DCC Workshop, 2023.
  13. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  14. Instruct-nerf2nerf: Editing 3d scenes with instructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  15. CLIPScore: a reference-free evaluation metric for image captioning. In EMNLP, 2021.
  16. Debiasing scores and prompts of 2d diffusion for view-consistent text-to-3d generation. Advances in Neural Information Processing Systems, 36, 2024.
  17. Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
  18. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
  19. Lerf: Language embedded radiance fields. In International Conference on Computer Vision (ICCV), 2023.
  20. Clip-mesh: Generating textured meshes from text using pretrained image-text models. SIGGRAPH Asia 2022 Conference Papers, December 2022.
  21. 3d-aware blending with generative nerfs. In ICCV, 2023.
  22. Segment anything, 2023.
  23. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
  24. Gs-ir: 3d gaussian splatting for inverse rendering. arXiv preprint arXiv:2311.16473, 2023.
  25. Magic3d: High-resolution text-to-3d content creation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  26. Stylerf: Zero-shot 3d style transfer of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8338–8348, 2023.
  27. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  28. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9298–9309, 2023.
  29. Editing conditional radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5773–5783, 2021.
  30. Syncdreamer: Generating multiview-consistent images from a single-view image. In The Twelfth International Conference on Learning Representations, 2024.
  31. Wonder3d: Single image to 3d using cross-domain diffusion, 2023.
  32. Att3d: Amortized text-to-3d object synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  33. OBJECT 3DIT: Language-guided 3d-aware image editing. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  34. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  35. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022.
  36. Snerf: stylized neural implicit representations for 3d scenes. ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH 2022, 2022.
  37. Point-e: A system for generating 3d point clouds from complex prompts, 2022.
  38. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, 2023.
  39. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023.
  40. Langsplat: 3d language gaussian splatting. In CVPR, 2024.
  41. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  42. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  43. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022.
  44. Clip-forge: Towards zero-shot text-to-shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18603–18613, 2022.
  45. Inserf: Text-driven generative object insertion in neural 3d scenes. Arxiv, 2024.
  46. Language-driven object fusion into neural radiance fields with pose-conditioned dataset updates. In IEEE Conference on Computer Vision and Pattern Recognition, 2024.
  47. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  48. Blending-nerf: Text-driven localized editing in neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14383–14393, 2023.
  49. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. In The Twelfth International Conference on Learning Representations, 2024.
  50. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. 2022 ieee. In CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3825–3834, 2021.
  51. Nerf-art: Text-driven neural radiance fields stylization. IEEE Transactions on Visualization and Computer Graphics, 2023.
  52. Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields, 2023.
  53. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023.
  54. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  55. Neural fields in visual computing and beyond. Computer Graphics Forum, 2022.
  56. Consistent-1-to-3: Consistent image to 3d view synthesis via geometry-aware diffusion models. In 3DV, 2024.
  57. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  58. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  59. Dreameditor: Text-driven 3d scene editing with neural fields. In SIGGRAPH Asia 2023 Conference Papers, pages 1–10, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.