Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Supervised Learning for Place Representation Generalization across Appearance Changes (2303.02370v3)

Published 4 Mar 2023 in cs.CV

Abstract: Visual place recognition is a key to unlocking spatial navigation for animals, humans and robots. While state-of-the-art approaches are trained in a supervised manner and therefore hardly capture the information needed for generalizing to unusual conditions, we argue that self-supervised learning may help abstracting the place representation so that it can be foreseen, irrespective of the conditions. More precisely, in this paper, we investigate learning features that are robust to appearance modifications while sensitive to geometric transformations in a self-supervised manner. This dual-purpose training is made possible by combining the two self-supervision main paradigms, \textit{i.e.} contrastive and predictive learning. Our results on standard benchmarks reveal that jointly learning such appearance-robust and geometry-sensitive image descriptors leads to competitive visual place recognition results across adverse seasonal and illumination conditions, without requiring any human-annotated labels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5297–5307, 2016.
  2. Learning representations by maximizing mutual information across views. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  3. Visual place recognition by spatial matching of high-level cnn features. Robotics and Autonomous Systems, 133:103625, 2020.
  4. Unifying deep local and global features for image search. In European Conference on Computer Vision, pages 726–743. Springer, 2020.
  5. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  6. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR, 2020.
  7. Convolutional neural network-based place recognition. arXiv preprint arXiv:1411.1509, 2014.
  8. Equivariant self-supervised learning: Encouraging equivariance in representations. In International Conference on Learning Representations, 2022.
  9. Condition-invariant multi-view place recognition. arXiv preprint arXiv:1902.09516, 2019.
  10. Self-supervised representation learning by rotation feature decoupling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  11. Where is your place, visual place recognition? In Zhi-Hua Zhou, editor, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, pages 4416–4425. ijcai.org, 2021.
  12. Seqmatchnet: Contrastive learning with sequence matching for place recognition & relocalization. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Conference on Robot Learning, 8-11 November 2021, London, UK, volume 164 of Proceedings of Machine Learning Research, pages 429–443. PMLR, 2021.
  13. Self-supervised learning of split invariant equivariant representations. arXiv preprint arXiv:2302.10283, 2023.
  14. Self-supervising fine-grained region similarities for large-scale image localization. In European conference on computer vision, pages 369–386. Springer, 2020.
  15. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, Nov 2020.
  16. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations, 2018.
  17. Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14141–14152, June 2021.
  18. Hierarchical multi-process fusion for visual place recognition. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 3327–3333. IEEE, 2020.
  19. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016.
  20. Convolutional neural network-based image representation for visual loop closure detection. In 2015 IEEE international conference on information and automation, pages 2238–2245. IEEE, 2015.
  21. Investigating the role of image retrieval for visual localization. Int. J. Comput. Vis., 130(7):1811–1836, 2022.
  22. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):4037–4058, 2021.
  23. Scatsimclr: Self-supervised contrastive learning with pretext task regularization for small-scale datasets. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 1098–1106, October 2021.
  24. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  25. Self-supervised learning: The dark matter of intelligence. https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/, March 2021. Consulted: October, 2022.
  26. Lightweight, viewpoint-invariant visual place recognition in changing environments. IEEE Robotics and Automation Letters, 3(2):957–964, 2018.
  27. Visual place recognition: A survey. IEEE Transactions on Robotics, 32(1):1–19, 2015.
  28. 1 Year, 1000km: The Oxford RobotCar Dataset. The International Journal of Robotics Research (IJRR), 36(1):3–15, 2017.
  29. Lightweight unsupervised deep loop closure. In Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018.
  30. Learnable pooling with context gating for video classification. arXiv preprint arXiv:1706.06905, 2017.
  31. Seqslam: Visual route-based navigation for sunny summer days and stormy winter nights. In 2012 IEEE international conference on robotics and automation, pages 1643–1649. IEEE, 2012.
  32. Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  33. Learning long-term invariant features for vision-based localization. In 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, March 12-15, 2018, pages 2038–2047. IEEE Computer Society, 2018.
  34. Leveraging equivariant features for absolute pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6876–6886, 2022.
  35. Unsupervised learning of visual representations by solving jigsaw puzzles. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, volume 9910 of Lecture Notes in Computer Science, pages 69–84. Springer, 2016.
  36. Single-view place recognition under seasonal changes. In PPNIV Workshop at IROS 2018, 2018.
  37. Deep architectures and ensembles for semantic video classification. IEEE Transactions on Circuits and Systems for Video Technology, 29(12):3568–3582, 2018.
  38. On compositions of transformations in contrastive self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9577–9587, October 2021.
  39. Megloc: A robust and accurate visual localization pipeline. CoRR, abs/2111.13063, 2021.
  40. Benchmarking image retrieval for visual localization. In Vitomir Struc and Francisco Gómez Fernández, editors, 8th International Conference on 3D Vision, 3DV 2020, Virtual Event, Japan, November 25-28, 2020, pages 483–494. IEEE, 2020.
  41. Learning with average precision: Training image retrieval with a listwise loss. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5107–5116, 2019.
  42. Kornia: an open source differentiable computer vision library for pytorch. In Winter Conference on Applications of Computer Vision, 2020.
  43. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
  44. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  45. Are we there yet? challenging seqslam on a 3000 km journey across all four seasons. In Proc. of workshop on long-term autonomy, IEEE international conference on robotics and automation (ICRA), page 2013, 2013.
  46. On the performance of convnet features for place recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4297–4304. IEEE, 2015.
  47. Adversarial feature disentanglement for place recognition across changing appearance. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 1301–1307. IEEE, 2020.
  48. Explicit feature disentanglement for visual place recognition across appearance changes. International Journal of Advanced Robotic Systems, 18(6):17298814211037497, 2021.
  49. Soft contrastive learning for visual localization. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 11119–11130. Curran Associates, Inc., 2020.
  50. Long-term visual localization revisited. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  51. 24/7 place recognition by view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1808–1817, 2015.
  52. Self-supervised learning of domain-invariant local features for robust visual localization under challenging conditions. IEEE Robotics Autom. Lett., 6(2):2753–2760, 2021.
  53. Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16611–16621, June 2021.
  54. Transvpr: Transformer-based place recognition with multi-level attention aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13648–13657, June 2022.
  55. Equivariance and invariance inductive bias for learning from insufficient data. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings. Springer, 2022.
  56. Residual relaxation for multi-view representation learning. Advances in Neural Information Processing Systems, 34:12104–12115, 2021.
  57. How to build a cognitive map. Nature Neuroscience, 25(10):1257–1272, Oct 2022.
  58. Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  59. Unsupervised learning of group invariant and equivariant representations. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, November 28-December 9, 2022, hybrid, 2022.
  60. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  61. Colorful image colorization. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III, volume 9907 of Lecture Notes in Computer Science, pages 649–666. Springer, 2016.
Citations (2)

Summary

We haven't generated a summary for this paper yet.