NeRF-VPT: Learning Novel View Representations with Neural Radiance Fields via View Prompt Tuning (2403.01325v1)
Abstract: Neural Radiance Fields (NeRF) have garnered remarkable success in novel view synthesis. Nonetheless, the task of generating high-quality images for novel views persists as a critical challenge. While the existing efforts have exhibited commendable progress, capturing intricate details, enhancing textures, and achieving superior Peak Signal-to-Noise Ratio (PSNR) metrics warrant further focused attention and advancement. In this work, we propose NeRF-VPT, an innovative method for novel view synthesis to address these challenges. Our proposed NeRF-VPT employs a cascading view prompt tuning paradigm, wherein RGB information gained from preceding rendering outcomes serves as instructive visual prompts for subsequent rendering stages, with the aspiration that the prior knowledge embedded in the prompts can facilitate the gradual enhancement of rendered image quality. NeRF-VPT only requires sampling RGB data from previous stage renderings as priors at each training stage, without relying on extra guidance or complex techniques. Thus, our NeRF-VPT is plug-and-play and can be readily integrated into existing methods. By conducting comparative analyses of our NeRF-VPT against several NeRF-based approaches on demanding real-scene benchmarks, such as Realistic Synthetic 360, Real Forward-Facing, Replica dataset, and a user-captured dataset, we substantiate that our NeRF-VPT significantly elevates baseline performance and proficiently generates more high-quality novel view images than all the compared state-of-the-art methods. Furthermore, the cascading learning of NeRF-VPT introduces adaptability to scenarios with sparse inputs, resulting in a significant enhancement of accuracy for sparse-view novel view synthesis. The source code and dataset are available at \url{https://github.com/Freedomcls/NeRF-VPT}.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35: 23716–23736.
- Nerf in detail: Learning to sample for view synthesis. arXiv preprint arXiv:2106.05264.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5855–5864.
- Gaudi: A neural architect for immersive 3d scene generation. arXiv preprint arXiv:2207.13751.
- Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6154–6162.
- Prompt-RSVQA: Prompting visual context to a language model for remote sensing visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1372–1381.
- Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, 333–350. Springer.
- StructNeRF: Neural Radiance Fields for Indoor Scenes with Structural Hints. arXiv preprint arXiv:2209.05277.
- Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12882–12891.
- Neural 3d scene reconstruction with the manhattan-world assumption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5511–5520.
- Baking neural radiance fields for real-time view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5875–5884.
- Visual prompt tuning. In European Conference on Computer Vision, 709–727. Springer.
- Geonerf: Generalizing nerf with geometry priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18365–18375.
- Conerf: Controllable neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18623–18632.
- AdaNeRF: Adaptive Sampling for Real-Time Rendering of Neural Radiance Fields. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, 254–270. Springer.
- End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds. arXiv: Computer Vision and Pattern Recognition.
- Recurrent 3d pose sequence machines. In Proceedings of the IEEE conference on computer vision and pattern recognition, 810–819.
- Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16190–16199.
- Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4): 1–14.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1): 99–106.
- Pose machines: Articulated pose estimation via inference machines. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13, 33–47. Springer.
- Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12892–12901.
- Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- The Replica Dataset: A Digital Replica of Indoor Spaces. arXiv: Computer Vision and Pattern Recognition.
- Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8248–8258.
- LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields. arXiv preprint arXiv:2304.10406.
- Semantic-Aware Generation for Self-Supervised Visual Representation Learning. arXiv preprint arXiv:2111.13163.
- Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 12959–12970.
- Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. arXiv preprint arXiv:2209.03494.
- Is Attention All NeRF Needs? arXiv e-prints, arXiv–2207.
- Ref-nerf: Structured view-dependent appearance for neural radiance fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5481–5490. IEEE.
- Nesf: Neural semantic fields for generalizable semantic segmentation of 3d scenes. arXiv preprint arXiv:2111.13260.
- Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. arXiv preprint arXiv:2303.16196.
- PERF: Panoramic Neural Radiance Field from a Single Panorama. arXiv preprint arXiv:2310.16831.
- Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4690–4699.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4): 600–612.
- Generative visual prompt: Unifying distributional control of pre-trained generative models. Advances in Neural Information Processing Systems, 35: 22422–22437.
- Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, 106–122. Springer.
- Recursive-NeRF: An efficient and dynamically growing NeRF. IEEE Transactions on Visualization and Computer Graphics.
- Recursive-NeRF: An Efficient and Dynamically Growing NeRF. IEEE Transactions on Visualization and Computer Graphics, 1–14.
- Volume Rendering of Neural Implicit Surfaces. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems, volume 34, 4805–4815. Curran Associates, Inc.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, 586–595.
- In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15838–15847.