Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition (2310.05184v1)

Published 8 Oct 2023 in cs.CV

Abstract: Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots. Recently, the hierarchical two-stage VPR methods have become popular in this field due to the trade-off between accuracy and efficiency. These methods retrieve the top-k candidate images using the global features in the first stage, then re-rank the candidates by matching the local features in the second stage. However, they usually require additional algorithms (e.g. RANSAC) for geometric consistency verification in re-ranking, which is time-consuming. Here we propose a Dynamically Aligning Local Features (DALF) algorithm to align the local features under spatial constraints. It is significantly more efficient than the methods that need geometric consistency verification. We present a unified network capable of extracting global features for retrieving candidates via an aggregation module and aligning local features for re-ranking via the DALF alignment module. We call this network AANet. Meanwhile, many works use the simplest positive samples in triplet for weakly supervised training, which limits the ability of the network to recognize harder positive pairs. To address this issue, we propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks. Extensive experiments on four benchmark VPR datasets show that the proposed AANet can outperform several state-of-the-art methods with less time consumption. The code is released at https://github.com/Lu-Feng/AANet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,” IEEE Transactions on Robotics, vol. 32, no. 1, pp. 1–19, 2016.
  2. R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
  3. B. Cao, A. Araujo, and J. Sim, “Unifying deep local and global features for image search,” in European Conference on Computer Vision.   Springer, 2020, pp. 726–743.
  4. H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 3304–3311.
  5. S. Lowry and H. Andreasson, “Lightweight, viewpoint-invariant visual place recognition in changing environments,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 957–964, 2018.
  6. A. Khaliq, S. Ehsan, Z. Chen, M. Milford, and K. Mcdonald-Maier, “A holistic visual place recognition approach using lightweight cnns for significant viewpoint and appearance changes,” IEEE Transactions on Robotics, vol. 36, no. 2, pp. 561–569, 2020.
  7. S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 141–14 152.
  8. R. Wang, Y. Shen, W. Zuo, S. Zhou, and N. Zheng, “Transvpr: Transformer-based place recognition with multi-level attention aggregation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 648–13 657.
  9. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  10. F. Lu, B. Chen, X.-D. Zhou, and D. Song, “Sta-vpr: Spatio-temporal alignment for visual place recognition,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4297–4304, 2021.
  11. Y. Shen, R. Wang, W. Zuo, and N. Zheng, “Tcl: Tightly coupled learning strategy for weakly supervised hierarchical place recognition,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2684–2691, 2022.
  12. L. Liu, H. Li, and Y. Dai, “Stochastic attraction-repulsion embedding for large scale image localization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2570–2579.
  13. A. Angeli, D. Filliat, S. Doncieux, and J. Meyer, “Fast and incremental method for loop-closure detection using bags of visual words,” IEEE Transactions on Robotics, vol. 24, no. 5, pp. 1027–1037, 2008.
  14. H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (surf),” Computer vision and image understanding, vol. 110, no. 3, pp. 346–359, 2008.
  15. M. Cummins and P. Newman, “Fab-map: Probabilistic localization and mapping in the space of appearance,” The International Journal of Robotics Research, vol. 27, no. 6, pp. 647–665, 2008.
  16. N. Sünderhauf, S. Shirazi, A. Jacobson, E. Pepperell, and M. Milford, “Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free,” in Robotics: Science and Systems XI, 2015.
  17. N. Sünderhauf, F. Dayoub, S. Shirazi, B. Upcroft, and M. Milford, “On the performance of convnet features for place recognition,” in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 4297–4304.
  18. S. Garg, A. Jacobson, S. Kumar, and M. Milford, “Improving condition-and environment-invariant place recognition with semantic place categorization,” in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 6863–6870.
  19. S. Garg, N. Suenderhauf, and M. Milford, “Don’t look back: Robustifying place categorization for viewpoint-and condition-invariant place recognition,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 3645–3652.
  20. Z. Chen, F. Maffra, I. Sa, and M. Chli, “Only look once, mining distinctive landmarks from convnet for visual place recognition,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 9–16.
  21. L. G. Camara, C. Gbert, and L. Peuil, “Highly robust visual place recognition through spatial matching of cnn features,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 3748–3755.
  22. Z. Xin, Y. Cai, T. Lu, X. Xing, S. Cai, J. Zhang, Y. Yang, and Y. Wang, “Localizing discriminative visual landmarks for place recognition,” in IEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 5979–5985.
  23. Z. Chen, A. Jacobson, N. Sunderhauf, B. Upcroft, and M. Milford, “Deep learning features at scale for visual place recognition,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 3223–3230.
  24. P. Yin, L. Xu, X. Li, C. Yin, Y. Li, R. A. Srivatsan, L. Li, J. Ji, and Y. He, “A multi-domain feature learning method for visual place recognition,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 319–324.
  25. T. Naseer, G. L. Oliveira, T. Brox, and W. Burgard, “Semantics-aware visual localization under challenging perceptual conditions,” in International Conference on Robotics and Automation (ICRA), 2017, pp. 2614–2620.
  26. Y. Hou, H. Zhang, and S. Zhou, “Bocnf: efficient image matching with bag of convnet features for scalable and robust visual place recognition,” Autonomous Robots, vol. 42, no. 6, pp. 1169–1185, 2018.
  27. G. Berton, C. Masone, V. Paolicelli, and B. Caputo, “Viewpoint invariant dense matching for visual geolocalization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12 169–12 178.
  28. S. Hausler and M. Milford, “Hierarchical multi-process fusion for visual place recognition,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 3327–3333.
  29. S. Garg and M. Milford, “Seqnet: Learning descriptors for sequence-based hierarchical place recognition,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4305–4312, 2021.
  30. F. Radenović, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1655–1668, 2018.
  31. A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, “Visual place recognition with repetitive structures,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 883–890.
  32. F. Warburg, S. Hauberg, M. Lopez-Antequera, P. Gargallo, Y. Kuang, and J. Civera, “Mapillary street-level sequences: A dataset for lifelong place recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2626–2635.
  33. H. Jin Kim, E. Dunn, and J.-M. Frahm, “Learned contextual feature reweighting for image geo-localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2136–2145.
  34. Y. Ge, H. Wang, F. Zhu, R. Zhao, and H. Li, “Self-supervising fine-grained region similarities for large-scale image localization,” in European conference on computer vision.   Springer, 2020, pp. 369–386.
  35. A. Hassani, S. Walton, N. Shah, A. Abuduweili, J. Li, and H. Shi, “Escaping the big data paradigm with compact transformers,” arXiv preprint arXiv:2104.05704, 2021.
  36. H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE transactions on acoustics, speech, and signal processing, vol. 26, no. 1, pp. 43–49, 1978.
  37. E. J. Keogh and M. J. Pazzani, “Scaling up dynamic time warping for datamining applications,” in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’00, 2000, pp. 285–289.
  38. G. Berton, R. Mereu, G. Trivigno, C. Masone, G. Csurka, T. Sattler, and B. Caputo, “Deep visual geo-localization benchmark,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5396–5407.
  39. D. Olid, J. M. Fácil, and J. Civera, “Single-view place recognition under seasonal changes,” arXiv preprint arXiv:1808.06516, 2018.
  40. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “Imagenet: A large-scale hierarchical image database,” in IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255.
  41. M. Leyva-Vallina, N. Strisciuglio, and N. Petkov, “Generalized contrastive optimization of siamese networks for place recognition,” arXiv preprint arXiv:2103.06638, 2021.
  42. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236.
  43. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4938–4947.
Citations (8)

Summary

We haven't generated a summary for this paper yet.