Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SPARF: Neural Radiance Fields from Sparse and Noisy Poses (2211.11738v3)

Published 21 Nov 2022 in cs.CV

Abstract: Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses. By relying on pixel matches extracted between the input views, our multi-view correspondence objective enforces the optimized scene and camera poses to converge to a global and geometrically accurate solution. Our depth consistency loss further encourages the reconstructed scene to be consistent from any viewpoint. Our approach sets a new state of the art in the sparse-view regime on multiple challenging datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Backpropagation-friendly eigendecomposition. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché Buc, Edward A. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pages 3156–3164, 2019.
  2. Neural RGB-D surface reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 6280–6291. IEEE, 2022.
  3. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5835–5844. IEEE, 2021.
  4. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 5460–5469. IEEE, 2022.
  5. SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  6. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 14104–14113. IEEE, 2021.
  7. Wide-baseline relative camera pose estimation with directional learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3257–3267, 2021.
  8. Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7907–7916, 2021.
  9. GARF: gaussian activated radiance fields for high fidelity reconstruction and pose estimation. CoRR, abs/2204.05735, 2022.
  10. Improving neural implicit surfaces geometry with patch warping. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 6250–6259. IEEE, 2022.
  11. Depth-supervised NeRF: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022.
  12. Superpoint: Self-supervised interest point detection and description. In 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 224–236, 2018.
  13. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  14. Rpnet: An end-to-end network for relative camera pose estimation. In Computer Vision - ECCV 2018 Workshops - Munich, Germany, September 8-14, 2018, Proceedings, Part I, pages 738–745, 2018.
  15. End-to-end learning of keypoint detection and matching for relative pose estimation. CoRR, abs/2104.01085, 2021.
  16. Optimal relative pose with unknown correspondences. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 1728–1736. IEEE Computer Society, 2016.
  17. Multiple View Geometry in Computer Vision. Cambridge University Press, USA, 2 edition, 2003.
  18. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001.
  19. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5885–5894, October 2021.
  20. Large scale multi-view stereopsis evaluation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 406–413. IEEE Computer Society, 2014.
  21. Self-calibrating neural radiance fields. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5826–5834, 2021.
  22. Infonerf: Ray entropy minimization for few-shot neural volume rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 12902–12911. IEEE, 2022.
  23. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, page 1097–1105, Red Hook, NY, USA, 2012. Curran Associates Inc.
  24. Neroic: neural rendering of objects from online image collections. ACM Trans. Graph., 41(4):56:1–56:12, 2022.
  25. Barf: Bundle-adjusting neural radiance fields. In IEEE International Conference on Computer Vision (ICCV), 2021.
  26. Neural rays for occlusion-aware image-based rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 7814–7823, 2022.
  27. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. ECCV, 2022.
  28. David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91–110, Nov. 2004.
  29. Relative camera pose estimation using convolutional neural networks. In Advanced Concepts for Intelligent Vision Systems - 18th International Conference, ACIVS 2017, Antwerp, Belgium, September 18-21, 2017, Proceedings, pages 675–687, 2017.
  30. Gnerf: Gan-based neural radiance field without posed camera. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 6331–6341. IEEE, 2021.
  31. Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM, 65(1):99–106, 2022.
  32. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022.
  33. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.
  34. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 2021.
  35. Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022.
  36. ORB: an efficient alternative to SIFT or SURF. In IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011, pages 2564–2571, 2011.
  37. Superglue: Learning feature matching with graph neural networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 4937–4946, 2020.
  38. Paul-Edouard Sarling. HLOC: Github project page. https://github.com/cvg/Hierarchical-Localization, 2021.
  39. Structure-from-motion revisited. In CVPR 2016, Las Vegas, NV, USA, pages 4104–4113, 2016.
  40. Learning neural transmittance for efficient rendering of reflectance fields. In 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25, 2021, page 45. BMVA Press, 2021.
  41. The replica dataset: A digital replica of indoor spaces. CoRR, abs/1906.05797, 2019.
  42. imap: Implicit mapping and positioning in real-time. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 6209–6218. IEEE, 2021.
  43. GRF: learning a general radiance field for 3d representation and rendering. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 15162–15172, 2021.
  44. Glampoints: Greedily learned accurate match points. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10731–10740, 2019.
  45. Learning accurate dense correspondences and when to trust them. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 5714–5724. Computer Vision Foundation / IEEE, 2021.
  46. GLU-Net: Global-local universal network for dense flow and correspondences. In 2020 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2020, 2020.
  47. Pdc-net+: Enhanced probabilistic dense correspondence network. In Preprint, 2021.
  48. Shinji Umeyama. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell., 13(4):376–380, 1991.
  49. Ibrnet: Learning multi-view image-based rendering. In CVPR, 2021.
  50. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612, 2004.
  51. NeRF−⁣−--- -: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021.
  52. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5590–5599. IEEE, 2021.
  53. Sinerf: Sinusoidal neural radiance fields for joint pose estimation and scene reconstruction. CoRR, abs/2210.04553, 2022.
  54. Ps-nerf: Neural inverse rendering for multi-view photometric stereo. CoRR, abs/2207.11406, 2022.
  55. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33, 2020.
  56. iNeRF: Inverting neural radiance fields for pose estimation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
  57. pixelNeRF: Neural radiance fields from one or few images. In CVPR, 2021.
  58. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  59. Relpose: Predicting probabilistic relative rotation for single objects in the wild. In Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXI, volume 13691 of Lecture Notes in Computer Science, pages 592–611. Springer, 2022.
  60. NeRS: Neural reflectance surfaces for sparse-view 3d reconstruction in the wild. In Conference on Neural Information Processing Systems, 2021.
  61. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  62. A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, Madrid, Spain, October 1-5, 2018, pages 7244–7251. IEEE, 2018.
  63. On the continuity of rotation representations in neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 5745–5753. Computer Vision Foundation / IEEE, 2019.
  64. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  65. Fusing the old with the new: Learning relative camera pose with geometry-guided uncertainty. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 32–42, 2021.
Citations (143)

Summary

  • The paper introduces SPARF, a method that jointly optimizes camera poses and neural radiance fields to achieve high-quality view synthesis from sparse, noisy inputs.
  • It employs innovative multi-view correspondence and depth consistency losses to enforce global geometric accuracy and coherent scene reconstructions.
  • Experimental results demonstrate that SPARF outperforms current state-of-the-art methods in pose registration and novel view synthesis, even with minimal input images.

Overview of SPARF: Neural Radiance Fields from Sparse and Noisy Poses

The paper introduces Sparse Pose Adjusting Radiance Field (SPARF), a novel method that extends the application of Neural Radiance Fields (NeRF) to scenarios where only a few input views with noisy pose information are available. The method is designed to address the limitations of NeRF in real-world applications where dense input views and highly accurate camera poses are not feasible, such as in AR/VR and autonomous driving scenarios.

Technical Summary

NeRF has showcased significant potential in synthesizing photorealistic views from dense and accurately posed camera images. However, its dependency on high-quality input poses and dense view coverage restricts its practical usability. This paper proposes SPARF to overcome these constraints by introducing a joint optimization strategy for camera pose refinement and scene representation based on sparse input data. The key components and contributions of this research are:

  • Multi-View Correspondence Loss: Unlike previous NeRF adjustments which rely heavily on individual image alignment and photometric consistency, SPARF introduces a multi-view correspondence objective. This objective leverages pixel matches between input views to ensure a globally consistent geometric solution across all views, guiding both the camera poses and the scene geometry towards accuracy.
  • Depth Consistency Loss: This objective uses rendered depth maps from initial viewpoints to enforce depth consistency in novel viewpoints. This loss encourages the reconstruction to remain coherent when viewed from unseen perspectives, thus improving rendering quality in novel views.
  • Joint Pose-NeRF Training: SPARF trains the NeRF model concurrently with camera pose adjustments. A staged training approach is adopted where pose optimization is performed jointly with coarse network training, followed by a phase where refined poses are used to train both coarse and fine networks for high-fidelity scene representation.

Experimental Evaluation

SPARF is evaluated on multiple challenging datasets including DTU, LLFF, and Replica, under the scenario of having as few as three input images. The results demonstrate that SPARF significantly outperforms existing state-of-the-art methods in both pose registration and view synthesis. Notably, SPARF exhibits robustness to initial noise in camera pose estimates, showcasing the critical contribution of the multi-view geometric constraints that work even under wide baselines with sparse views.

  1. Performance in Sparse Views: In conditions where models like BARF and SCNeRF underperform due to insufficient pose registration accuracy in sparse-eye scenarios, SPARF achieves superior registration and synthesis quality.
  2. Comparison with Dense View Approaches: While dense input view-based methods rely on more extensive imagery, SPARF sets new benchmarks in the sparse regime. Even conditional models that generalize from pre-trained datasets such as PixelNeRF, show limited effectiveness in out-of-distribution scenes compared to SPARF’s targeted optimization per scene.

Implications and Future Directions

The implications of this research are substantial for advancing 3D scene representation methodologies. By relaxing the dense view and precise pose prerequisites, SPARF has the potential to democratize high-quality view synthesis in more varied and challenging environment deployments. This can notably enhance applications in robotics and immersive reality technologies.

Future work could involve integrating SPARF into more efficient voxel grid representations to accelerate convergence and experimenting with pose refinement under variable intrinsic camera parameters. Moreover, the development of more sophisticated correspondence techniques or learning-based methods for prioritizing informative matches could further streamline and bolster SPARF’s robustness across various scene complexities.

Ultimately, SPARF’s ability to maintain global geometric consistency and render high-quality novel views with minimal inputs is a critical stride toward practical and scalable 3D scene processing with neural fields.

Youtube Logo Streamline Icon: https://streamlinehq.com