Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GO-NeRF: Generating Objects in Neural Radiance Fields for Virtual Reality Content Creation (2401.05750v2)

Published 11 Jan 2024 in cs.CV

Abstract: Virtual environments (VEs) are pivotal for virtual, augmented, and mixed reality systems. Despite advances in 3D generation and reconstruction, the direct creation of 3D objects within an established 3D scene (represented as NeRF) for novel VE creation remains a relatively unexplored domain. This process is complex, requiring not only the generation of high-quality 3D objects but also their seamless integration into the existing scene. To this end, we propose a novel pipeline featuring an intuitive interface, dubbed GO-NeRF. Our approach takes text prompts and user-specified regions as inputs and leverages the scene context to generate 3D objects within the scene. We employ a compositional rendering formulation that effectively integrates the generated 3D objects into the scene, utilizing optimized 3D-aware opacity maps to avoid unintended modifications to the original scene. Furthermore, we develop tailored optimization objectives and training strategies to enhance the model's ability to capture scene context and mitigate artifacts, such as floaters, that may occur while optimizing 3D objects within the scene. Extensive experiments conducted on both forward-facing and 360o scenes demonstrate the superior performance of our proposed method in generating objects that harmonize with surrounding scenes and synthesizing high-quality novel view images. We are committed to making our code publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Blended latent diffusion. ACM Transactions on Graphics (TOG), 42(4):1–11, 2023.
  2. Learning personalized high quality volumetric head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16890–16900, 2023.
  3. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  4. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  5. Zip-nerf: Anti-aliased grid-based neural radiance fields. arXiv preprint arXiv:2304.06706, 2023.
  6. Hybrid neural rendering for large-scale scenes with motion blur. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 154–164, 2023.
  7. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  8. Blended-nerf: Zero-shot object generation and blending in existing neural radiance fields. arXiv preprint arXiv:2306.12760, 2023.
  9. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  10. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789, 2023.
  11. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718, 2021.
  12. Stylizednerf: consistent 3d scene stylization as stylized nerf via 2d-3d mutual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18342–18352, 2022.
  13. Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
  14. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  15. Decomposing nerf for editing via feature field distillation. Advances in Neural Information Processing Systems, 35:23311–23330, 2022.
  16. Dreamhuman: Animatable 3d avatars from text. arXiv preprint arXiv:2306.09329, 2023.
  17. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
  18. Zero-1-to-3: Zero-shot one image to 3d object, 2023.
  19. Editing conditional radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5773–5783, 2021.
  20. Iss: Image as stetting stone for text-guided 3d shape generation. arXiv preprint arXiv:2209.04145, 2022.
  21. The contextual loss for image transformation with non-aligned data. In Proceedings of the European conference on computer vision (ECCV), pages 768–783, 2018.
  22. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 2019.
  23. nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  24. Reference-guided controllable inpainting of neural radiance fields. arXiv preprint arXiv:2304.09677, 2023a.
  25. Spin-nerf: Multiview segmentation and perceptual inpainting with neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20669–20679, 2023b.
  26. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  27. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
  28. Compositional 3d scene generation using locally conditioned diffusion. arXiv preprint arXiv:2303.12218, 2023.
  29. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  30. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  32. Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508, 2023.
  33. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021.
  34. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
  35. Resolution-robust large mask inpainting with fourier convolutions. arXiv preprint arXiv:2109.07161, 2021.
  36. Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–12, 2023.
  37. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184, 2023.
  38. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3835–3844, 2022.
  39. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
  40. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  41. Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In European Conference on Computer Vision, pages 597–614. Springer, 2022.
  42. Lin Yen-Chen. Nerf-pytorch. https://github.com/yenchenlin/nerf-pytorch/, 2020.
  43. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
  44. Text-to-3d with classifier score distillation. arXiv preprint arXiv:2310.19415, 2023.
  45. Nerf-editing: geometry editing of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18353–18364, 2022.
  46. Arf: Artistic radiance fields. In European Conference on Computer Vision, pages 717–733. Springer, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Peng Dai (46 papers)
  2. Feitong Tan (14 papers)
  3. Xin Yu (192 papers)
  4. Yinda Zhang (68 papers)
  5. Xiaojuan Qi (133 papers)
  6. Yifan Peng (147 papers)
Citations (1)

Summary

Introduction

A recent development in the field of 3D object generation within pre-existing scenes has emerged through a method known as GO-NeRF. GO-NeRF stands for "Generating Virtual Objects in Neural Radiance Fields," and it is at the cutting edge of harmoniously integrating virtual 3D objects into existing 3D environments. The overarching goal is to better serve applications in scene creation and editing, where it is crucial to blend newly generated objects seamlessly into the existing backdrop.

Methodology

The approach is built upon two key components: a compositional rendering formulation and context-aware learning objectives. GO-NeRF introduces an innovative user interface that allows for the precise placement of virtual objects in a given scene by selecting a 3D location based on the scene's depth information. Subsequently, it generates a new object neural radiance field in this location and renders it separately from the scene, followed by a seamless compositional process. Such a method of operation enables the preservation of the integrity of the original scene's content.

To ensure the generated object complements the scene contextually, GO-NeRF leverages 2D image inpainting priors from diffusion models, employing what is known as score distillation sampling. In addition, a regularizer is introduced to harmonize the saturation levels of the generated object with the rest of the scene, which addresses issues related to over-saturation.

Experimentation and Results

The efficacy of GO-NeRF was validated through extensive experiments across various datasets, comparing favorably against alternative methods. The results highlight GO-NeRF’s ability to produce high-quality, context-compatible 3D objects with shadows and reflections that contribute to a harmonious scene. A standout feature is GO-NeRF’s interface, which streamlines the generation process and makes it viable for users without specialized 3D software expertise.

Quantitative evaluations also paint a picture of GO-NeRF’s performance superiority, where it achieves higher CLIP scores – a metric that quantifies the alignment between generated objects and text prompts. This suggests that the virtual objects produced are more in tune with the text descriptions they are based on.

Potential and Future Directions

The implications of GO-NeRF span various applications such as virtual reality, game design, and film production, where accurate and realistic scene construction is essential. GO-NeRF also opens up possibilities for image inpainting and style adaptation, enabling more nuanced and detailed scene editing.

While there are limitations, such as the potential mismatch between the defined 3D box and areas affected by generated objects (e.g., reflections outside the box), the foundation laid by GO-NeRF paves the way for future investigations. Moving forward, dynamic adjustments to specified boxes and addressing SDS loss limitations could further refine this technology.

In summary, GO-NeRF represents a significant advancement in the field of 3D object generation and scene composition in neural radiance fields, offering exciting opportunities for creating immersive and cohesive 3D environments.

Github Logo Streamline Icon: https://streamlinehq.com