Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Regressing Transformers for Data-efficient Visual Place Recognition (2401.16304v1)

Published 29 Jan 2024 in cs.CV and cs.LG

Abstract: Visual place recognition is a critical task in computer vision, especially for localization and navigation systems. Existing methods often rely on contrastive learning: image descriptors are trained to have small distance for similar images and larger distance for dissimilar ones in a latent space. However, this approach struggles to ensure accurate distance-based image similarity representation, particularly when training with binary pairwise labels, and complex re-ranking strategies are required. This work introduces a fresh perspective by framing place recognition as a regression problem, using camera field-of-view overlap as similarity ground truth for learning. By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking, offering data-efficient training and strong generalization across several benchmark datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. M. J. Milford and G. F. Wyeth, “Seqslam: Visual route-based navigation for sunny summer days and stormy winter nights,” in ICRA, 2012, pp. 1643–1649.
  2. S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,” IEEE Transactions on Robotics, vol. 32, no. 1, pp. 1–19, 2016.
  3. D. Doan, Y. Latif, T. Chin, Y. Liu, T. Do, and I. Reid, “Scalable place recognition under appearance change for autonomous driving,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9318–9327.
  4. N. Pion, M. Humenberger, G. Csurka, Y. Cabon, and T. Sattler, “Benchmarking image retrieval for visual localization,” in International Conference on 3D Vision, 2020.
  5. M. Zaffar, S. Garg, M. Milford, J. Kooij, D. Flynn, K. McDonald-Maier, and S. Ehsan, “Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change,” International Journal of Computer Vision, vol. 129, no. 7, pp. 2136–2174, 2021.
  6. F. Radenović, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,” TPAMI, vol. 41, no. 7, pp. 1655–1668, 2018.
  7. R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in CVPR, 2016, pp. 5297–5307.
  8. M. Lopez-Antequera, M. Leyva-Vallina, N. Strisciuglio, and N. Petkov, “Place and object recognition by cnn-based cosfire filters,” IEEE Access, vol. 7, pp. 66 157–66 166, 2019.
  9. R. Wang, Y. Shen, W. Zuo, S. Zhou, and N. Zhen, “Transvpr: Transformer-based place recognition with multi-level attention aggregation,” in CVPR, 2022.
  10. G. Berton, C. Masone, and B. Caputo, “Rethinking visual geo-localization for large-scale applications,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 4868–4878.
  11. S. Zhu, L. Yang, C. Chen, M. Shah, X. Shen, and H. Wang, “R2former: Unified retrieval and reranking transformer for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 370–19 380.
  12. L. Liu, H. Li, and Y. Dai, “Stochastic attraction-repulsion embedding for large scale image localization,” in CVPR, 2019, pp. 2570–2579.
  13. F. Warburg, S. Hauberg, M. López-Antequera, P. Gargallo, Y. Kuang, and J. Civera, “Mapillary street-level sequences: A dataset for lifelong place recognition,” in CVPR, 2020.
  14. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperGlue: Learning feature matching with graph neural networks,” in CVPR, 2020. [Online]. Available: https://arxiv.org/abs/1911.11763
  15. S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” in CVPR, 2021, pp. 14 141–14 152.
  16. B. Cao, A. Araujo, and J. Sim, “Unifying deep local and global features for image search,” in ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds., 2020, pp. 726–743.
  17. M. Leyva-Vallina, N. Strisciuglio, and N. Petkov, “Data-efficient large scale place recognition with graded similarity supervision,” CVPR, 2023.
  18. V. Balntas, S. Li, and V. Prisacariu, “Relocnet: Continuous metric learning relocalisation using neural nets,” in ECCV, 2018, pp. 751–767.
  19. D. Galvez-López and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,” IEEE Transactions on Robotics, vol. 28, no. 5, pp. 1188–1197, 2012.
  20. A. Torii, J. Sivic, M. Okutomi, and T. Pajdla, “Visual place recognition with repetitive structures,” TPAMI, 2015.
  21. F. Perronnin, Y. Liu, J. Sánchez, and H. Poirier, “Large-scale image retrieval with compressed fisher vectors,” in CVPR.   IEEE, 2010, pp. 3384–3391.
  22. H. Jegou, F. Perronnin, M. Douze, J. Sánchez, P. Perez, and C. Schmid, “Aggregating local image descriptors into compact codes,” TPAMI, vol. 34, no. 9, pp. 1704–1716, 2011.
  23. X. Zhang, L. Wang, and Y. Su, “Visual place recognition: A survey from deep learning perspective,” Pattern Recognition, p. 107760, 2020.
  24. C. Masone and B. Caputo, “A survey on deep visual place recognition,” IEEE Access, pp. 1–1, 2021.
  25. Z. Chen, O. Lam, A. Jacobson, and M. Milford, “Convolutional neural network-based place recognition,” ACRA, 2014.
  26. M. Leyva-Vallina, N. Strisciuglio, M. López-Antequera, R. Tylecek, M. Blaich, and N. Petkov, “Tb-places: A data set for visual place recognition in garden environments,” IEEE Access, 2019.
  27. M. Leyva-Vallina, N. Strisciuglio, and N. Petkov, “Place recognition in gardens by learning visual representations: data set and benchmark analysis,” in CAIP.   Springer, 2019, pp. 324–335.
  28. A. Gordo, J. Almazán, J. Revaud, and D. Larlus, “End-to-end learning of deep visual representations for image retrieval,” International Journal of Computer Vision, vol. 124, no. 2, pp. 237–254, 2017.
  29. M. Angelina Uy and G. Hee Lee, “Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition,” in CVPR, 2018, pp. 4470–4479.
  30. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in CVPR, 2017, pp. 652–660.
  31. J. Thoma, D. P. Paudel, and L. Van Gool, “Soft contrastive learning for visual localization,” NeurIPS, 2020.
  32. A. Torii, R. Arandjelović, J. Sivic, M. Okutomi, and T. Pajdla, “24/7 place recognition by view synthesis,” in CVPR, 2013.
  33. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com