Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GenN2N: Generative NeRF2NeRF Translation (2404.02788v1)

Published 3 Apr 2024 in cs.CV

Abstract: We present GenN2N, a unified NeRF-to-NeRF translation framework for various NeRF translation tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc. Unlike previous methods designed for individual translation tasks with task-specific schemes, GenN2N achieves all these NeRF editing tasks by employing a plug-and-play image-to-image translator to perform editing in the 2D domain and lifting 2D edits into the 3D NeRF space. Since the 3D consistency of 2D edits may not be assured, we propose to model the distribution of the underlying 3D edits through a generative model that can cover all possible edited NeRFs. To model the distribution of 3D edited NeRFs from 2D edited images, we carefully design a VAE-GAN that encodes images while decoding NeRFs. The latent space is trained to align with a Gaussian distribution and the NeRFs are supervised through an adversarial loss on its renderings. To ensure the latent code does not depend on 2D viewpoints but truly reflects the 3D edits, we also regularize the latent code through a contrastive learning scheme. Extensive experiments on various editing tasks show GenN2N, as a universal framework, performs as well or better than task-specific specialists while possessing flexible generative power. More results on our project page: https://xiangyueliu.github.io/GenN2N/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Blended latent diffusion. ACM Transactions on Graphics (TOG), 42(4):1–11, 2023.
  2. Instructpix2pix: Learning to follow image editing instructions. arXiv preprint arXiv:2211.09800, 2022.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  4. Pix2nerf: Unsupervised conditional p-gan for single image to neural radiance fields translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3981–3990, 2022.
  5. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
  6. Neuraleditor: Editing neural radiance fields via manipulating point clouds. arXiv preprint arXiv:2305.03049, 2023.
  7. Stylizing 3d scene via implicit representation and hypernetwork. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1475–1484, 2022.
  8. Corf: Colorizing radiance fields using knowledge distillation. ArXiv, abs/2309.07668, 2023.
  9. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010, 2023.
  10. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789, 2023.
  11. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  12. Adversarial texture optimization from rgb-d scans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1559–1568, 2020.
  13. Stylizednerf: consistent 3d scene stylization as stylized nerf via 2d-3d mutual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18342–18352, 2022.
  14. Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
  15. Ddcolor: Towards photo-realistic and semantic-aware image colorization via dual decoders. arXiv preprint arXiv:2212.11613, 2022.
  16. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  17. Palettenerf: Palette-based appearance editing of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20691–20700, 2023.
  18. Learning representations for automatic colorization. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 577–593. Springer, 2016.
  19. Control-nerf: Editable feature volumes for scene rendering and manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4340–4350, 2023.
  20. Nerf-in: Free-form nerf inpainting with rgb-d priors. arXiv preprint arXiv:2206.04901, 2022.
  21. Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023.
  22. Editing conditional radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5773–5783, 2021.
  23. Realfusion: 360° reconstruction of any object from a single image. arXiv e-prints, pages arXiv–2302, 2023.
  24. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019.
  25. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  26. Spin-nerf: Multiview segmentation and perceptual inpainting with neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20669–20679, 2023.
  27. Zero-shot image-to-image translation. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023.
  28. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  29. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  30. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  31. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022.
  32. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems, 33:20154–20166, 2020.
  33. Resolution-robust large mask inpainting with fourier convolutions. arXiv preprint arXiv:2109.07161, 2021.
  34. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149–2159, 2022.
  35. Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–12, 2023.
  36. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3835–3844, 2022a.
  37. Nerf-sr: High quality neural radiance fields using supersampling. In Proceedings of the 30th ACM International Conference on Multimedia, pages 6445–6454, 2022b.
  38. Nerf-art: Text-driven neural radiance fields stylization. IEEE Transactions on Visualization and Computer Graphics, 2023.
  39. Rodin: A generative model for sculpting 3d digital avatars using diffusion. arXiv preprint arXiv:2212.06135, 2022c.
  40. Palettenerf: Palette-based color editing for nerfs. arXiv preprint arXiv:2212.12871, 2022.
  41. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1790–1799, 2020.
  42. Or-nerf: Object removing from 3d scenes guided by multiview segmentation with neural radiance fields. arXiv preprint arXiv:2305.10503, 2023.
  43. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
  44. Nerf-editing: geometry editing of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18353–18364, 2022.
  45. Resshift: Efficient diffusion model for image super-resolution by residual shifting. arXiv preprint arXiv:2307.12348, 2023.
  46. Editable free-viewpoint video using a layered neural representation. ACM Transactions on Graphics (TOG), 40(4):1–18, 2021.
  47. Arf: Artistic radiance fields. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI, pages 717–733. Springer, 2022.
  48. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  49. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  50. Editablenerf: Editing topologically varying neural radiance fields by key points. arXiv preprint arXiv:2212.04247, 2022.
  51. Dreameditor: Text-driven 3d scene editing with neural fields. arXiv preprint arXiv:2306.13455, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiangyue Liu (13 papers)
  2. Han Xue (20 papers)
  3. Kunming Luo (18 papers)
  4. Ping Tan (101 papers)
  5. Li Yi (111 papers)
Citations (2)

Summary

Overview of GenN2N: Enhancing NeRFs through Generative Translation

The paper introduces GenN2N, a comprehensive framework for executing NeRF-to-NeRF translation to perform diverse 3D NeRF editing tasks such as text-driven editing, colorization, super-resolution, and inpainting. This unification contrasts with existing task-specific NeRF editing schemes which rely heavily on specialized, domain-specific knowledge. GenN2N innovatively integrates 2D image editing approaches to streamline these varied NeRF alterations while maintaining the multi-view consistency vital to coherent 3D scenes.

Key Concepts and Methodology

GenN2N's methodology can be segmented into two primary operational stages:

  1. 2D Image-to-Image Editing: Utilizing robust plug-and-play image-to-image translators, GenN2N applies precise 2D edits to render images generated from NeRFs. These 2D domains edits span common tasks such as colorization and super-resolution, leveraging the extensive adaptability and effectiveness of existing 2D editing tools.
  2. 3D NeRF Adaptation: Post 2D-editing, the framework translates these edits into the 3D NeRF model. This adaptation addresses the pivotal challenge of ensuring 3D visual consistency throughout diverse views by modeling the distribution of potential 3D edits via a variational autoencoder (VAE) enhanced with generative adversarial network (GAN) components. The edit consistency is preserved via a carefully structured latent space aligned with a Gaussian distribution, augmented by contrastive learning principles to disentangle 3D alterations from variable 2D viewpoints.

Experimental Results and Implications

The experiments covered diverse datasets and verified the efficiency of GenN2N across various neural radiance applications. Notably, the framework demonstrated competencies surpassing or matching task-specialist methods, simultaneously offering significant flexibility in generating diverse NeRF outputs. The extensive results underscore GenN2N's potential in simplifying the creation and customization of 3D models, allowing for high-quality rendering outputs without task-dependent methodological modifications.

Implications for Future AI Development

The impact of GenN2N extends beyond its immediate application, hinting at broader capabilities for integrating complex 2D generative models into 3D analytics and broader AI systems. This approach might redefine the foundations for processing and manipulating high-quality 3D content, potentially leading to innovations in VR, AR, and various immersive technologies. The generative capabilities, alongside the adaptability of GenN2N, position it as a meaningful contributor to future research in transcending traditional 3D model optimization barriers.

Overall, GenN2N represents a significant stride in leveraging generative models for multi-modal applications in neural rendering, setting the stage for subsequent advancements in the seamless integration and translation of neurally generated visual spaces.

Reddit Logo Streamline Icon: https://streamlinehq.com