On Model-Free Re-ranking for Visual Place Recognition with Deep Learned Local Features (2410.18573v2)
Abstract: Re-ranking is the second stage of a visual place recognition task, in which the system chooses the best-matching images from a pre-selected subset of candidates. Model-free approaches compute the image pair similarity based on a spatial comparison of corresponding local visual features, eliminating the need for computationally expensive estimation of a model describing transformation between images. The article focuses on model-free re-ranking based on standard local visual features and their applicability in long-term autonomy systems. It introduces three new model-free re-ranking methods that were designed primarily for deep-learned local visual features. These features evince high robustness to various appearance changes, which stands as a crucial property for use with long-term autonomy systems. All the introduced methods were employed in a new visual place recognition system together with the D2-net feature detector (Dusmanu, 2019) and experimentally tested with diverse, challenging public datasets. The obtained results are on par with current state-of-the-art methods, affirming that model-free approaches are a viable and worthwhile path for long-term visual place recognition.
- C. Masone and B. Caputo, “A survey on deep visual place recognition,” IEEE Access, vol. 9, pp. 19 516–19 547, 2021.
- L. G. Camara, T. Pivoňka, M. Jílek, C. Gäbert, K. Košnar, and L. Přeučil, “Accurate and robust teach and repeat navigation by visual place recognition: A cnn approach,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 6018–6024.
- X. Li, M. Larson, and A. Hanjalic, “Pairwise geometric matching for large-scale object retrieval,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5153–5161.
- S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14 136–14 147.
- Y. Zhang, Z. Jia, and T. Chen, “Image retrieval with geometry-preserving visual phrases,” in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 809–816.
- L. G. Camara and L. Přeučil, “Visual place recognition by spatial matching of high-level cnn features,” Robotics and Autonomous Systems, vol. 133, p. 103625, 2020.
- M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8084–8093.
- G. Barbarani, M. Mostafa, H. Bayramov, G. Trivigno, G. Berton, C. Masone, and B. Caputo, “Are local features all you need for cross-domain visual place recognition?” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023, pp. 6155–6165.
- A. Ali-Bey, B. Chaib-Draa, and P. Giguére, “Mixvpr: Feature mixing for visual place recognition,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 2997–3006.
- H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 3304–3311.
- R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5297–5307.
- N. V. Keetha, M. Milford, and S. Garg, “A hierarchical dual model of environment- and place-specific utility for visual place recognition,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6969–6976, 2021.
- D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 337–33 712.
- P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4937–4946.
- F. Radenović, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1655–1668, 2019.
- G. Berton, C. Masone, and B. Caputo, “Rethinking visual geo-localization for large-scale applications,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 4868–4878.
- F. Lu, L. Zhang, X. Lan, S. Dong, Y. Wang, and C. Yuan, “Towards seamless adaptation of pre-trained models for visual place recognition,” in The Twelfth International Conference on Learning Representations, 2024.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
- Y. Avrithis and G. Tolias, “Hough pyramid matching,” International Journal of Computer Vision, vol. 107, no. 1, pp. 1–19, 2014. [Online]. Available: http://link.springer.com/10.1007/s11263-013-0659-3
- X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu, “Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3013–3020.
- S. Garg, N. Suenderhauf, and M. Milford, “Lost? appearance-invariant place recognition for opposite viewpoints using visual semantics,” Proceedings of Robotics: Science and Systems XIV, 2018.
- Z. Li and N. Snavely, “Megadepth: Learning single-view depth prediction from internet photos,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 2041–2050.
- T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, F. Kahl, and T. Pajdla, “Benchmarking 6dof outdoor visual localization in changing conditions,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8601–8610.
- W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” The International Journal of Robotics Research (IJRR), vol. 36, no. 1, pp. 3–15, 2017.
- (2024) Mapillary. [Online]. Available: https://www.mapillary.com
- Z. Chen, F. Maffra, I. Sa, and M. Chli, “Only look once, mining distinctive landmarks from convnet for visual place recognition,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 9–16.
- A. Glover, “Day and night, left and right,” Mar. 2014. [Online]. Available: https://doi.org/10.5281/zenodo.4590133
- S. Skrede. (2013) Nordlandsbanen: minute by minute, season by season. [Online]. Available: https://nrkbeta.no/2013/01/15/nordlandsbanen-minute-by-minute-season-by-season
- P. Neubert, N. Sünderhauf, and P. Protzel, “Superpixel-based appearance change prediction for long-term navigation across seasons,” Robotics and Autonomous Systems, vol. 69, pp. 15–27, 2015, selected papers from 6th European Conference on Mobile Robots.
- X. Zhao, X. Wu, W. Chen, P. C. Y. Chen, Q. Xu, and Z. Li, “Aliked: A lighter keypoint and descriptor extraction network via deformable transformation,” IEEE Transactions on Instrumentation & Measurement, vol. 72, pp. 1–16, 2023. [Online]. Available: https://arxiv.org/pdf/2304.03608.pdf
- A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, “Visual place recognition with repetitive structures,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 883–890.