VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition (2403.09025v1)
Abstract: This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of images improves performance. In our recent work on measuring domain gaps between image datasets, we proposed a Visual Distribution of Neuron Activations (VDNA) representation to represent datasets of images. This representation can naturally handle image sequences and provides a general and granular feature representation derived from a general-purpose model. Moreover, our representation is based on tracking neuron activation values over the list of images to represent and is not limited to a particular neural network layer, therefore having access to high- and low-level concepts. This work shows how VDNAs can be used for VPR by learning a very lightweight and simple encoder to generate task-specific descriptors. Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution, such as to indoor environments and aerial imagery.
- S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual Place Recognition: A Survey,” IEEE Transactions on Robotics, vol. 32, no. 1, pp. 1–19, Feb. 2016.
- N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “AnyLoc: Towards Universal Visual Place Recognition,” arXiv, 2023.
- R. Mereu, G. Trivigno, G. Berton, C. Masone, and B. Caputo, “Learning Sequential Descriptors for Sequence-Based Visual Place Recognition,” IEEE Robotics and Automation Letters, 2022.
- B. Ramtoula, M. Gadd, P. Newman, and D. De Martini, “Visual DNA: Representing and Comparing Images Using Distributions of Neuron Activations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 11 113–11 123.
- R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
- H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010, pp. 3304–3311.
- J. Zhang, Y. Cao, and Q. Wu, “Vector of Locally and Adaptively Aggregated Descriptors for Image Feature Representation,” Pattern Recognition, vol. 116, p. 107952, 2021.
- J. Yu, C. Zhu, J. Zhang, Q. Huang, and D. Tao, “Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition,” IEEE transactions on neural networks and learning systems, vol. 31, no. 2, pp. 661–674, 2019.
- Ş. Săftescu, M. Gadd, D. De Martini, D. Barnes, and P. Newman, “Kidnapped radar: Topological radar localisation using rotationally-invariant metric learning,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 4358–4364.
- M. A. Uy and G. H. Lee, “PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4470–4479.
- A. Ali-Bey, B. Chaib-Draa, and P. Giguere, “MixVPR: Feature Mixing for Visual Place Recognition,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2998–3007.
- S. Zhu, L. Yang, C. Chen, M. Shah, X. Shen, and H. Wang, “R2former: Unified retrieval and reranking transformer for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 370–19 380.
- F. Radenović, G. Tolias, and O. Chum, “Fine-tuning CNN image retrieval with no human annotation,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1655–1668, 2018.
- G. Berton, C. Masone, and B. Caputo, “Rethinking Visual Geo-Localization for Large-Scale Applications,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4878–4888.
- M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “DINOv2: Learning Robust Visual Features without Supervision ,” arXiv preprint arXiv:2304.07193, 2023.
- M. J. Milford and G. F. Wyeth, “SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights,” in IEEE international conference on robotics and automation, 2012.
- K. L. Ho and P. Newman, “Detecting Loop Closure with Scene Sequences,” International journal of computer vision, vol. 74, pp. 261–286, 2007.
- G. Bertasius, H. Wang, and L. Torresani, “Is Space-Time Attention All You Need for Video Understanding? ,” in ICML, vol. 2, no. 3, 2021, p. 4.
- S. Garg and M. Milford, “SeqNet: Learning Descriptors for Sequence-based Hierarchical Place Recognition ,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4305–4312, 2021.
- S. Hausler, A. Jacobson, and M. Milford, “Filter Early, Match Late: Improving Network-Based Visual Place Recognition,” in 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2019, pp. 3268–3275.
- N. Sünderhauf, S. Shirazi, F. Dayoub, B. Upcroft, and M. Milford, “On the Performance of ConvNet Features for Place Recognition,” in 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2015, pp. 4297–4304.
- J. Mao, X. Hu, X. He, L. Zhang, L. Wu, and M. J. Milford, “Learning to Fuse Multiscale Features for Visual Place Recognition,” IEEE Access, vol. 7, pp. 5723–5735, 2018.
- M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
- M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demystifying MMD GANs,” arXiv preprint arXiv:1801.01401, 2018.
- W. Tu, W. Deng, T. Gedeon, and L. Zheng, “A Bag-of-Prototypes Representation for Dataset-Level Applications,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2881–2892.
- Y. Rubner, C. Tomasi, and L. J. Guibas, “The Earth Mover’s Distance as a Metric for Image Retrieval,” International journal of computer vision, vol. 40, pp. 99–121, 2000.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
- M. Gadd, B. Ramtoula, D. De Martini, and P. Newman, “What you see is what you get: Experience ranking with deep neural dataset-to-dataset similarity for topological localisation,” in International Symposium on Experimental Robotics (ISER), 2023.
- K. Atasu and T. Mittelholzer, “Linear-Complexity Data-Parallel Earth Mover’s Distance Approximations,” in International Conference on Machine Learning. PMLR, 2019, pp. 364–373.
- K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.
- E. Hoffer and N. Ailon, “Deep metric learning using Triplet network,” in Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3. Springer, 2015, pp. 84–92.
- A. v. d. Oord, Y. Li, and O. Vinyals, “Representation Learning with Contrastive Predictive Coding ,” arXiv preprint arXiv:1807.03748, 2018.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations ,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- F. Warburg, S. Hauberg, M. Lopez-Antequera, P. Gargallo, Y. Kuang, and J. Civera, “Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- X. Sun, Y. Xie, P. Luo, and L. Wang, “A Dataset for Benchmarking Image-Based Localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7436–7444.
- M. Schleiss, F. Rouatbi, and D. Cremers, “VPAIR–Aerial Visual Place Recognition and Localization in Large-scale Outdoor Environments,” arXiv preprint arXiv:2205.11567, 2022.
- R. Sahdev and J. K. Tsotsos, “Indoor Place Recognition System for Localization of Mobile Robots,” in 2016 13th Conference on computer and robot vision (CRV). IEEE, 2016, pp. 53–60.
- A. Glover, “Day and Night, Left and Right,” Mar. 2014. [Online]. Available: https://doi.org/10.5281/zenodo.4590133
- A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, “Visual Place Recognition with Repetitive Structures,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 883–890.
- M. Warren, D. McKinnon, H. He, and B. Upcroft, “Unaided Stereo Vision Based Pose Estimation,” in Proceedings of the 2010 Australasian Conference on Robotics and Automation. Australian Robotics & Automation Association, 2010, pp. 1–8.
- W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The Oxford RobotCar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017.
- I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” in International Conference on Learning Representations, 2019.
- R. Arandjelovic and A. Zisserman, “All About VLAD,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1578–1585.