The NeRFect Match: Exploring NeRF Features for Visual Localization (2403.09577v2)
Abstract: In this work, we propose the use of Neural Radiance Fields (NeRF) as a scene representation for visual localization. Recently, NeRF has been employed to enhance pose regression and scene coordinate regression models by augmenting the training database, providing auxiliary supervision through rendered images, or serving as an iterative refinement module. We extend its recognized advantages -- its ability to provide a compact scene representation with realistic appearances and accurate geometry -- by exploring the potential of NeRF's internal features in establishing precise 2D-3D matches for localization. To this end, we conduct a comprehensive examination of NeRF's implicit knowledge, acquired through view synthesis, for matching under various conditions. This includes exploring different matching network architectures, extracting encoder features at multiple layers, and varying training configurations. Significantly, we introduce NeRFMatch, an advanced 2D-3D matching function that capitalizes on the internal knowledge of NeRF learned via view synthesis. Our evaluation of NeRFMatch on standard localization benchmarks, within a structure-based pipeline, sets a new state-of-the-art for localization performance on Cambridge Landmarks.
- Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5297–5307, 2016.
- Wide area localization on mobile phones. In 2009 8th ieee international symposium on mixed and augmented reality, pages 73–82. IEEE, 2009.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
- Rethinking visual geo-localization for large-scale applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4878–4888, 2022.
- Extending absolute pose regression to multiple scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 38–39, 2020.
- Learning less is more-6d camera localization via 3d surface regression. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4654–4662, 2018.
- Visual camera re-localization from rgb and rgb-d images using dsac. IEEE transactions on pattern analysis and machine intelligence, 44(9):5847–5865, 2021.
- Dsac-differentiable ransac for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6684–6692, 2017.
- On the limits of pseudo ground truth in visual camera re-localisation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6218–6228, 2021.
- Accelerated coordinate encoding: Learning to relocalize in minutes using rgb and poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5044–5053, 2023.
- Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2616–2625, 2018.
- Hybrid scene compression for visual localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7653–7662, 2019.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Aspanformer: Detector-free image matching with adaptive span transformer. In European Conference on Computer Vision, pages 20–36. Springer, 2022a.
- Leveraging neural radiance fields for uncertainty-aware visual localization. arXiv preprint arXiv:2310.06984, 2023a.
- Direct-posenet: Absolute pose regression with photometric consistency. In 2021 International Conference on 3D Vision (3DV), pages 1175–1185. IEEE, 2021.
- Dfnet: Enhance absolute pose regression with direct feature matching. In European Conference on Computer Vision, pages 1–17. Springer Nature Switzerland Cham, 2022b.
- Refinement for absolute pose regression with neural feature synthesis. arXiv preprint arXiv:2303.10087, 2023b.
- Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022.
- Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018.
- Keyframe-based real-time camera tracking. In 2009 IEEE 12th international conference on computer vision, pages 1538–1545. IEEE, 2009.
- D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 8092–8101, 2019.
- Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In 2022 International Conference on 3D Vision (3DV), pages 1–11. IEEE, 2022.
- Complete solution classification for the perspective-three-point problem. IEEE transactions on pattern analysis and machine intelligence, 25(8):930–943, 2003.
- Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14141–14152, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Project autovision: Localization and 3d scene perception for an autonomous vehicle with a multi-camera system. In 2019 International Conference on Robotics and Automation (ICRA), pages 4695–4702. IEEE, 2019.
- Nerf-rpn: A general framework for object detection in nerfs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23528–23538, 2023.
- From structure-from-motion point clouds to fast location recognition. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 2599–2606. IEEE, 2009.
- An efficient algebraic solution to the perspective-three-point problem. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7225–7233, 2017.
- Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5974–5983, 2017.
- Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, pages 2938–2946, 2015.
- Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, (ICLR) 2015, 2015.
- A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In CVPR 2011, pages 2969–2976. IEEE, 2011.
- Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12881, 2022.
- Hierarchical scene coordinate classification and regression for visual localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11983–11992, 2020.
- Location recognition using prioritized feature matching. In European conference on computer vision, pages 791–804. Springer, 2010.
- Nerf-loc: Visual localization with conditional neural radiance field. 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
- Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
- Loc-nerf: Monte carlo localization using neural radiance fields. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4018–4025. IEEE, 2023.
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision, pages 405–421. Springer, 2020.
- Lens: Localization enhanced by nerf synthesis. In Conference on Robot Learning, pages 1347–1356. PMLR, 2022.
- Crossfire: Camera relocalization on self-supervised features from an implicit representation. arXiv preprint arXiv:2303.04869, 2023.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Meshloc: Mesh-based visual localization. In European Conference on Computer Vision, pages 589–609. Springer, 2022.
- Urban radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12932–12942, 2022.
- R2d2: Reliable and repeatable detector and descriptor. Advances in neural information processing systems, 32, 2019.
- Nerf-slam: Real-time dense monocular slam with neural radiance fields. arXiv preprint arXiv:2210.13641, 2022.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
- From coarse to fine: Robust hierarchical localization at large scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12716–12725, 2019.
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
- Back to the feature: Learning robust camera localization from pixels to pose. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3247–3257, 2021.
- Efficient & effective prioritized matching for large-scale image-based localization. IEEE transactions on pattern analysis and machine intelligence, 39(9):1744–1756, 2016.
- Understanding the limitations of cnn-based absolute camera pose regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3302–3312, 2019.
- Learning multi-scene absolute pose regression with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2733–2742, 2021.
- Scene coordinate regression forests for camera relocalization in rgb-d images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2930–2937, 2013.
- imap: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6229–6238, 2021.
- Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8922–8931, 2021.
- Inloc: Indoor visual localization with dense matching and view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7199–7209, 2018.
- Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8248–8258, 2022.
- Learning camera localization via dense scene matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1831–1841, 2021.
- Neumap: Neural coordinate mapping by auto-transdecoder for camera localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 929–939, 2023.
- 24/7 place recognition by view synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1808–1817, 2015.
- Global localization from monocular slam on a mobile phone. IEEE transactions on visualization and computer graphics, 20(4):531–539, 2014.
- Image-based localization using lstms for structured feature correlation. In Proceedings of the IEEE International Conference on Computer Vision, pages 627–637, 2017.
- Learning feature descriptors using camera pose supervision. In European Conference on Computer Vision, pages 757–774. Springer, 2020.
- Natural landmark-based monocular localization for mavs. In 2011 IEEE International Conference on Robotics and Automation, pages 5792–5799. IEEE, 2011.
- Neural fields in visual computing and beyond. Computer Graphics Forum, 2022.
- S-nerf: Neural radiance fields for street views. arXiv preprint arXiv:2303.00749, 2023.
- Nerf-det: Learning geometry-aware volumetric representation for multi-view 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23320–23330, 2023.
- Sanet: Scene agnostic network for camera localization. In Proceedings of the IEEE/CVF international conference on computer vision, pages 42–51, 2019.
- inerf: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021.
- Metaformer baselines for vision. arXiv preprint arXiv:2210.13452, 2022.
- Go-slam: Global optimization for consistent 3d instant reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3727–3737, 2023.
- In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.
- Patch2pix: Epipolar-guided pixel-level correspondences. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4669–4678, 2021.
- Is geometry enough for matching in visual localization? In European Conference on Computer Vision, pages 407–425. Springer, 2022.
- Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12786–12796, 2022.