ConGeo: Robust Cross-view Geo-localization across Ground View Variations (2403.13965v2)
Abstract: Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view. In real-world scenarios, the task requires accommodating diverse ground images captured by users with varying orientations and reduced field of views (FoVs). However, existing learning pipelines are orientation-specific or FoV-specific, demanding separate model training for different ground view variations. Such models heavily depend on the North-aligned spatial correspondence and predefined FoVs in the training data, compromising their robustness across different settings. To tackle this challenge, we propose ConGeo, a single- and cross-view Contrastive method for Geo-localization: it enhances robustness and consistency in feature representations to improve a model's invariance to orientation and its resilience to FoV variations, by enforcing proximity between ground view variations of the same location. As a generic learning objective for cross-view geo-localization, when integrated into state-of-the-art pipelines, ConGeo significantly boosts the performance of three base models on four geo-localization benchmarks for diverse ground view variations and outperforms competing methods that train separate models for each ground view variation.
- Cross-view image geolocalization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 891–898, 2013.
- Where am I looking at? joint location and orientation estimation by cross-view matching. In IEEE Conference on Computer Vision and Pattern Recognition, pages 4064–4072, 2020.
- Spatial-aware feature aggregation for image based cross-view geo-localization. Advances in Neural Information Processing Systems, 32, 2019.
- TransGeo: Transformer is all you need for cross-view image geo-localization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1162–1171, 2022.
- Sample4Geo: Hard negative sampling for cross-view geo-localisation. In IEEE International Conference on Computer Vision, pages 16847–16856, 2023.
- Accurate image localization based on google maps street view. In European Conference on Computer Vision, pages 255–268, 2010.
- Cross-view policy learning for street navigation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 8100–8109, 2019.
- Uncertainty-aware vision-based metric cross-view geolocalization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 21621–21631, 2023.
- Wide-area image geolocalization with aerial reference imagery. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3961–3969, 2015.
- Lending orientation to neural networks for cross-view geo-localization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5624–5633, 2019.
- VIGOR: Cross-view image geo-localization beyond one-to-one retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3640–3649, 2021.
- Simple, effective and general: A new backbone for cross-view image geo-localization. arXiv preprint arXiv:2302.01572, 2023.
- Soft exemplar highlighting for cross-view image-based geo-localization. IEEE Transactions on Image Processing, 31:2094–2105, 2022.
- Signature verification using a “siamese" time delay neural network. Advances in Neural Information Processing Systems, 6, 1993.
- A convnet for the 2020s. In IEEE Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022.
- Revisiting street-to-aerial view image geo-localization and orientation estimation. In IEEE Workshops on Applications of Computer Vision, pages 756–765, 2021.
- Global assists local: Effective aerial representations for field of view constrained image geo-localization. In IEEE Workshops on Applications of Computer Vision, pages 3871–3879, 2022.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, 2020.
- Momentum contrast for unsupervised visual representation learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
- Supervised contrastive learning. Advances in Neural Information Processing Systems, 2020.
- What makes for good views for contrastive learning? Advances in Neural Information Processing Systems, 33:6827–6839, 2020.
- Contrastive learning based hybrid networks for long-tailed image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 943–952, 2021.
- Detco: Unsupervised contrastive learning for object detection. In IEEE International Conference on Computer Vision, pages 8392–8401, 2021.
- Learning to contrast the counterfactual samples for robust visual question answering. In EMNLP, pages 3285–3292, 2020.
- CVM-Net: Cross-view matching network for image-based ground-to-aerial geo-localization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 7258–7267, 2018.
- Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019.
- Predicting ground-level scene layout from aerial imagery. In IEEE Conference on Computer Vision and Pattern Recognition, pages 867–875, 2017.
- University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In ACM International Conference on Multimedia, pages 1395–1403, 2020.
- Optimal feature transport for cross-view image geo-localization. In AAAI Conference on Artificial Intelligence, volume 34, pages 11990–11997, 2020.
- Cross-view geo-localization with layer-to-layer transformer. Advances in Neural Information Processing Systems, 34:29009–29020, 2021.
- Cross-view geo-localization via learning disentangled geometric layout correspondence. In AAAI Conference on Artificial Intelligence, volume 37, pages 3480–3488, 2023.
- Visual cross-view metric localization with dense uncertainty estimates. In European Conference on Computer Vision, pages 90–106, 2022.
- Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(2):867–879, 2021.
- Robust image geolocalization. Technical report, 2023.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pages 5905–5914, 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE International Conference on Computer Vision, pages 10012–10022, 2021.
- Coming down to Earth: Satellite-to-street view synthesis for geo-localization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 6488–6497, 2021.
- Jacob Gildenblat and contributors. Pytorch library for cam methods. https://github.com/jacobgil/pytorch-grad-cam, 2021.
- Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2019.
- Content-based unrestricted adversarial attack. Advances in Neural Information Processing Systems, 36, 2024.
- Li Mi (7 papers)
- Chang Xu (323 papers)
- Javiera Castillo-Navarro (6 papers)
- Syrielle Montariol (22 papers)
- Wen Yang (185 papers)
- Antoine Bosselut (85 papers)
- Devis Tuia (81 papers)