BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation (2312.15363v2)
Abstract: Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effectively in GNSS-denied environments. Current research employs a variety of techniques to reduce the domain gap such as applying polar transforms to aerial images or synthesising between perspectives. However, these approaches generally rely on having a 360{\deg} field of view, limiting real-world feasibility. We propose BEV-CV, an approach introducing two key novelties with a focus on improving the real-world viability of cross-view geo-localisation. Firstly bringing ground-level images into a semantic Birds-Eye-View before matching embeddings, allowing for direct comparison with aerial image representations. Secondly, we adapt datasets into application realistic format - limited Field-of-View images aligned to vehicle direction. BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70{\deg} crops of CVUSA and CVACT by 23% and 24% respectively. Also decreasing computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities.
- “Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022
- “Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4063–4071
- “CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization” arXiv, 2022
- “On the location dependence of convolutional neural network features” In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015, pp. 70–78
- “Learning deep representations for ground-to-aerial geolocalization” In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5007–5015
- Scott Workman, Richard Souvenir and Nathan Jacobs “Wide-Area Image Geolocalization with Aerial Reference Imagery” In 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3961–3969
- Nam N. Vo and James Hays “Localizing and Orienting Street Views Using Overhead Imagery” In European Conference on Computer Vision, 2016
- “CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7258–7267
- “NetVLAD: CNN Architecture for Weakly Supervised Place Recognition” In IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2015, pp. 1437–1451
- “Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6484–6493
- Sijie Zhu, Taojiannan Yang and Chen Chen “Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation” In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 756–765
- Hongji Yang, Xiufan Lu and Ying J. Zhu “Cross-view Geo-localization with Layer-to-Layer Transformer” In Neural Information Processing Systems, 2021
- Sijie Zhu, Mubarak Shah and Chen Chen “TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1152–1161
- “GEOCAPSNET: Ground to Aerial View Image Geo-Localization using Capsule Network” In 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019, pp. 742–747
- “Lending Orientation to Neural Networks for Cross-View Geo-Localization” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5617–5626
- “Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization” In Neural Information Processing Systems, 2019
- “Bridging the Domain Gap for Ground-to-Aerial Image Matching” In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 470–479
- “Optimal Feature Transport for Cross-View Image Geo-Localization” In ArXiv abs/1907.05021, 2019
- Chenyang Lu, M.J.G. Molengraft and Gijs Dubbelman “Monocular Semantic Occupancy Grid Mapping With Convolutional Variational Encoder–Decoder Networks” In IEEE Robotics and Automation Letters 4, 2018, pp. 445–452
- “Learning to Look around Objects for Top-View Representations of Outdoor Scenes” In European Conference on Computer Vision, 2018
- “Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11135–11144
- “Enabling spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation” In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 5133–5139
- “Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15531–15540
- “Translating Images into Maps” In 2022 International Conference on Robotics and Automation (ICRA), 2021, pp. 9200–9206
- “’The Pedestrian next to the Lamppost” Adaptive Object Graphs for Better Instantaneous Mapping” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 19506–19515
- “Uncertainty-aware Vision-based Metric Cross-view Geolocalization” In ArXiv abs/2211.12145, 2022
- Olaf Ronneberger, Philipp Fischer and Thomas Brox “U-Net: Convolutional Networks for Biomedical Image Segmentation” In ArXiv abs/1505.04597, 2015
- “A Simple Framework for Contrastive Learning of Visual Representations” In arXiv preprint arXiv:2002.05709, 2020
- “nuScenes: A multimodal dataset for autonomous driving” In CVPR, 2020
- Volodymyr Mnih “Machine Learning for Aerial Image Labeling”, 2013
- “Global Assists Local: Effective Aerial Representations for Field of View Constrained Image Geo-Localization” In 2022 IEEE Winter Conference on Applications of Computer Vision (WACV), 2022
- “Semantic understanding of scenes through the ade20k dataset” In International Journal on Computer Vision, 2018
- “PyTorch: An Imperative Style, High-Performance Deep Learning Library” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019
- William Falcon and The PyTorch Lightning team “PyTorch Lightning”, 2019