NeRF-Supervised Feature Point Detection and Description (2403.08156v3)
Abstract: Feature point detection and description is the backbone for various computer vision applications, such as Structure-from-Motion, visual SLAM, and visual place recognition. While learning-based methods have surpassed traditional handcrafted techniques, their training often relies on simplistic homography-based simulations of multi-view perspectives, limiting model generalisability. This paper presents a novel approach leveraging Neural Radiance Fields (NeRFs) to generate a diverse and realistic dataset consisting of indoor and outdoor scenes. Our proposed methodology adapts state-of-the-art feature detectors and descriptors for training on multi-view NeRF-synthesised data, with supervision achieved through perspective projective geometry. Experiments demonstrate that the proposed methodology achieves competitive or superior performance on standard benchmarks for relative pose estimation, point cloud registration, and homography estimation while requiring significantly less training data and time compared to existing approaches.
- A survey of structure from motion, 2017.
- Visual slam algorithms: a survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications, 9(1):16, Jun 2017.
- A survey of state-of-the-art on visual slam. Expert Systems with Applications, 205:117734, 2022.
- Visual place recognition: A survey. IEEE Transactions on Robotics, 32(1):1–19, 2016.
- Visual place recognition: A survey from deep learning perspective. Pattern Recognition, 113:107760, 2021.
- Superpoint: Self-supervised interest point detection and description. CoRR, abs/1712.07629, 2017.
- R2d2: Repeatable and reliable detector and descriptor. arXiv preprint arXiv:1906.06195, 2019.
- Alike: Accurate and lightweight keypoint detection and descriptor extraction. IEEE Transactions on Multimedia, 25:3101–3112, 2023.
- Gcnv2: Efficient correspondence prediction for real-time slam. IEEE Robotics and Automation Letters, 4(4):3505–3512, 2019.
- A deep-learning real-time visual slam system based on multi-task feature extraction network and self-supervised feature points. Measurement, 168:108403, 2021.
- Hudson Martins Silva Bruno and Esther Luna Colombini. Lift-slam: A deep-learning feature-based monocular visual slam method. Neurocomputing, 455:97–110, September 2021.
- Structure-from-motion using dense cnn features with keypoint relocalization, 2018.
- Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, SIGGRAPH ’23, New York, NY, USA, 2023. Association for Computing Machinery.
- Silk: Simple learned keypoints. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22499–22508, October 2023.
- David G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2):91–110, November 2004.
- Orb: An efficient alternative to sift or surf. In 2011 International Conference on Computer Vision, pages 2564–2571, 2011.
- Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. CoRR, abs/1702.04405, 2017.
- Megadepth: Learning single-view depth prediction from internet photos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2041–2050, 2018.
- DISK: learning local features with policy gradient. CoRR, abs/2006.13566, 2020.
- D2-net: A trainable cnn for joint description and detection of local features. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8084–8093, 2019.
- Superglue: Learning feature matching with graph neural networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4937–4946, 2020.
- Loftr: Detector-free local feature matching with transformers. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8918–8927, 2021.
- Cotr: Correspondence transformer for matching across images. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6187–6197, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. CoRR, abs/2103.13415, 2021.
- Zip-nerf: Anti-aliased grid-based neural radiance fields. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 19640–19648, Los Alamitos, CA, USA, oct 2023. IEEE Computer Society.
- Nerfmentation: Nerf-based augmentation for monocular depth estimation, 2024.
- Nerf-supervision: Learning dense object descriptors from neural radiance fields. In 2022 International Conference on Robotics and Automation (ICRA), page 6496–6503. IEEE Press, 2022.
- Nerf-supervised deep stereo. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 855–866, Los Alamitos, CA, USA, jun 2023. IEEE Computer Society.
- Han Ling. Adfactory: An effective framework for generalizing optical flow with nerf, 2023.
- Benedikt Bitterli. Rendering resources, 2016. https://benedikt-bitterli.me/resources/.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5460–5469, 2022.
- Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7206–7215, 2021.
- NeRF−−--- -: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021.
- Pytorch: An imperative style, high-performance deep learning library. CoRR, abs/1912.01703, 2019.
- Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In CVPR, 2017.
- The new data and new challenges in multimedia research. CoRR, abs/1503.01817, 2015.
- Benchmarking 6dof outdoor visual localization in changing conditions. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8601–8610, 2018.
- R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521540518, second edition, 2004.
- Learning to find good correspondences. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2666–2674, 2018.
- UnsupervisedR&R: Unsupervised Point Cloud Registration via Differentiable Rendering. In CVPR, 2021.
- W. Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A, 32(5):922–923, Sep 1976.
- Deep global registration. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2511–2520, 2020.