PIGEON: Predicting Image Geolocations (2307.05845v6)
Abstract: Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function. Additionally, our work is the first to perform retrieval over location clusters for guess refinements. We train two models for evaluations on street-level data and general-purpose image geolocalization; the first model, PIGEON, is trained on data from the game of Geoguessr and is capable of placing over 40% of its guesses within 25 kilometers of the target location globally. We also develop a bot and deploy PIGEON in a blind experiment against humans, ranking in the top 0.01% of players. We further challenge one of the world's foremost professional Geoguessr players to a series of six matches with millions of viewers, winning all six games. Our second model, PIGEOTTO, differs in that it is trained on a dataset of images from Flickr and Wikipedia, achieving state-of-the-art results on a wide range of image geolocalization benchmarks, outperforming the previous SOTA by up to 7.7 percentage points on the city accuracy level and up to 38.8 percentage points on the country level. Our findings suggest that PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places and that our approach can pave the way for highly accurate, planet-scale image geolocalization systems. Our code is available on GitHub.
- Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications, 2021.
- OPTICS: Ordering Points to Identify the Clustering Structure. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD ’99, pp. 49–60, New York, NY, USA, 1999. Association for Computing Machinery. ISBN 1581130848. doi:10.1145/304182.304187. URL https://doi.org/10.1145/304182.304187.
- Large Scale Visual Geo-Localization of Images in Mountainous Terrain. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., and Schmid, C. (eds.), Computer Vision – ECCV 2012, pp. 517–530, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. ISBN 978-3-642-33709-3.
- Present and future Köppen-Geiger climate classification maps at 1-km resolution. Scientific Data, 5(1):180214, Oct 2018. ISSN 2052-4463. doi:10.1038/sdata.2018.214. URL https://doi.org/10.1038/sdata.2018.214.
- Rethinking Visual Geo-localization for Large-Scale Applications, 2022. URL https://arxiv.org/abs/2204.02287.
- BlueFinder: Estimate Where a Beach Photo Was Taken. In Proceedings of the 21st International Conference on World Wide Web, WWW ’12 Companion, pp. 469–470, New York, NY, USA, 2012. Association for Computing Machinery. ISBN 9781450312301. doi:10.1145/2187980.2188081. URL https://doi.org/10.1145/2187980.2188081.
- Where We Are and What We’re Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes, 2023.
- Mapping the World’s Photos. In WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 761–880, 2009.
- Artificial Neural Networks Applied to Taxi Destination Prediction, 2015. URL https://arxiv.org/abs/1508.00021.
- GADM. GADM Version 4.1, 2022. URL https://gadm.org/about.html.
- Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States. Proceedings of the National Academy of Sciences, 114(50):13108–13113, 2017. doi:10.1073/pnas.1700035114. URL https://www.pnas.org/doi/abs/10.1073/pnas.1700035114.
- Learning generalized zero-shot learners for open-domain image geolocalization, 2023.
- IM2GPS: estimating geographic information from a single image. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008.
- Self-destructing models: Increasing the costs of harmful dual uses of foundation models, 2023.
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, 2017. URL https://arxiv.org/abs/1704.04861.
- An image is worth 16x16 words: Transformers for image recognition at scale. 2021.
- Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation, 2021. URL https://arxiv.org/abs/2105.07645.
- ImageNet Classification with Deep Convolutional Neural Networks. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
- The Benchmarking Initiative for Multimedia Evaluation: MediaEval 2016. IEEE MultiMedia, 24(1):93–96, 2017. doi:10.1109/MMUL.2017.9.
- Cross-View Image Geolocalization. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898, 2013. doi:10.1109/CVPR.2013.120.
- Lucas, J. A Geography Game Has Its First Superstar. Can It Survive Its First Player Revolt?, 2023. URL https://www.theinformation.com/articles/a-geography-game-has-its-first-superstar-can-it-survive-its-first-player-revolt.
- G^3: Geolocation via Guidebook Grounding. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 5841–5853, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.430.
- A Survey on Deep Visual Place Recognition. IEEE Access, 9:19516–19547, 2021. doi:10.1109/ACCESS.2021.3054937.
- Geolocation Estimation of Photos Using a Hierarchical Model and Scene Classification. In Ferrari, V., Hebert, M., Sminchisescu, C., and Weiss, Y. (eds.), Computer Vision – ECCV 2018, pp. 575–592, Cham, 2018. Springer International Publishing. ISBN 978-3-030-01258-8.
- OpenAI. GPT-4V(ision) System Card, September 2023.
- Where in the World is this Image? Transformer-based Geo-localization in the Wild, 2022.
- Learning Transferable Visual Models From Natural Language Supervision, 2021.
- Image Based Geo-localization in the Alps. International Journal of Computer Vision, 116(3):213–225, Feb 2016. ISSN 1573-1405. doi:10.1007/s11263-015-0830-0. URL https://doi.org/10.1007/s11263-015-0830-0.
- CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps, 2018.
- Prototypical Networks for Few-shot Learning. CoRR, abs/1703.05175, 2017. URL http://arxiv.org/abs/1703.05175.
- DeepGeo: Photo Localization with Deep Neural Network, 2018. URL https://arxiv.org/abs/1810.03077.
- Interpretable Semantic Photo Geolocation, 2021.
- CrossLocate: Cross-Modal Large-Scale Visual Geo-Localization in Natural Environments Using Rendered Modalities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3174–3183, January 2022.
- User-Driven Geolocation of Untagged Desert Imagery Using Digital Elevation Models. In 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 237–244, 2013. doi:10.1109/CVPRW.2013.42.
- Attention Is All You Need, 2017.
- Revisiting IM2GPS in the Deep Learning Era, 2017.
- PlaNet - Photo Geolocation with Convolutional Neural Networks. In European Conference on Computer Vision (ECCV), 2016.
- Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval, 2020. URL https://arxiv.org/abs/2004.01804.
- IM2City: Image Geo-Localization via Multi-Modal Learning. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI ’22, pp. 50–61, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450395328. doi:10.1145/3557918.3565868. URL https://doi.org/10.1145/3557918.3565868.
- Cross-view Geo-localization with Layer-to-Layer Transformer. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 29009–29020. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/f31b20466ae89669f9741e047487eb37-Paper.pdf.
- Accurate Image Localization Based on Google Maps Street View. In Daniilidis, K., Maragos, P., and Paragios, N. (eds.), Computer Vision – ECCV 2010, pp. 255–268, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. ISBN 978-3-642-15561-1.
- Image Geo-Localization Based on Multiple Nearest Neighbor Feature Matching UsingGeneralized Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8):1546–1558, 2014. doi:10.1109/TPAMI.2014.2299799.
- TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization, 2022.