Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PIGEON: Predicting Image Geolocations (2307.05845v6)

Published 11 Jul 2023 in cs.CV and cs.LG

Abstract: Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function. Additionally, our work is the first to perform retrieval over location clusters for guess refinements. We train two models for evaluations on street-level data and general-purpose image geolocalization; the first model, PIGEON, is trained on data from the game of Geoguessr and is capable of placing over 40% of its guesses within 25 kilometers of the target location globally. We also develop a bot and deploy PIGEON in a blind experiment against humans, ranking in the top 0.01% of players. We further challenge one of the world's foremost professional Geoguessr players to a series of six matches with millions of viewers, winning all six games. Our second model, PIGEOTTO, differs in that it is trained on a dataset of images from Flickr and Wikipedia, achieving state-of-the-art results on a wide range of image geolocalization benchmarks, outperforming the previous SOTA by up to 7.7 percentage points on the city accuracy level and up to 38.8 percentage points on the country level. Our findings suggest that PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places and that our approach can pave the way for highly accurate, planet-scale image geolocalization systems. Our code is available on GitHub.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications, 2021.
  2. OPTICS: Ordering Points to Identify the Clustering Structure. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD ’99, pp.  49–60, New York, NY, USA, 1999. Association for Computing Machinery. ISBN 1581130848. doi:10.1145/304182.304187. URL https://doi.org/10.1145/304182.304187.
  3. Large Scale Visual Geo-Localization of Images in Mountainous Terrain. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., and Schmid, C. (eds.), Computer Vision – ECCV 2012, pp.  517–530, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. ISBN 978-3-642-33709-3.
  4. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Scientific Data, 5(1):180214, Oct 2018. ISSN 2052-4463. doi:10.1038/sdata.2018.214. URL https://doi.org/10.1038/sdata.2018.214.
  5. Rethinking Visual Geo-localization for Large-Scale Applications, 2022. URL https://arxiv.org/abs/2204.02287.
  6. BlueFinder: Estimate Where a Beach Photo Was Taken. In Proceedings of the 21st International Conference on World Wide Web, WWW ’12 Companion, pp.  469–470, New York, NY, USA, 2012. Association for Computing Machinery. ISBN 9781450312301. doi:10.1145/2187980.2188081. URL https://doi.org/10.1145/2187980.2188081.
  7. Where We Are and What We’re Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes, 2023.
  8. Mapping the World’s Photos. In WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp.  761–880, 2009.
  9. Artificial Neural Networks Applied to Taxi Destination Prediction, 2015. URL https://arxiv.org/abs/1508.00021.
  10. GADM. GADM Version 4.1, 2022. URL https://gadm.org/about.html.
  11. Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States. Proceedings of the National Academy of Sciences, 114(50):13108–13113, 2017. doi:10.1073/pnas.1700035114. URL https://www.pnas.org/doi/abs/10.1073/pnas.1700035114.
  12. Learning generalized zero-shot learners for open-domain image geolocalization, 2023.
  13. IM2GPS: estimating geographic information from a single image. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008.
  14. Self-destructing models: Increasing the costs of harmful dual uses of foundation models, 2023.
  15. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, 2017. URL https://arxiv.org/abs/1704.04861.
  16. An image is worth 16x16 words: Transformers for image recognition at scale. 2021.
  17. Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation, 2021. URL https://arxiv.org/abs/2105.07645.
  18. ImageNet Classification with Deep Convolutional Neural Networks. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
  19. The Benchmarking Initiative for Multimedia Evaluation: MediaEval 2016. IEEE MultiMedia, 24(1):93–96, 2017. doi:10.1109/MMUL.2017.9.
  20. Cross-View Image Geolocalization. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.  891–898, 2013. doi:10.1109/CVPR.2013.120.
  21. Lucas, J. A Geography Game Has Its First Superstar. Can It Survive Its First Player Revolt?, 2023. URL https://www.theinformation.com/articles/a-geography-game-has-its-first-superstar-can-it-survive-its-first-player-revolt.
  22. G^3: Geolocation via Guidebook Grounding. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  5841–5853, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.430.
  23. A Survey on Deep Visual Place Recognition. IEEE Access, 9:19516–19547, 2021. doi:10.1109/ACCESS.2021.3054937.
  24. Geolocation Estimation of Photos Using a Hierarchical Model and Scene Classification. In Ferrari, V., Hebert, M., Sminchisescu, C., and Weiss, Y. (eds.), Computer Vision – ECCV 2018, pp.  575–592, Cham, 2018. Springer International Publishing. ISBN 978-3-030-01258-8.
  25. OpenAI. GPT-4V(ision) System Card, September 2023.
  26. Where in the World is this Image? Transformer-based Geo-localization in the Wild, 2022.
  27. Learning Transferable Visual Models From Natural Language Supervision, 2021.
  28. Image Based Geo-localization in the Alps. International Journal of Computer Vision, 116(3):213–225, Feb 2016. ISSN 1573-1405. doi:10.1007/s11263-015-0830-0. URL https://doi.org/10.1007/s11263-015-0830-0.
  29. CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps, 2018.
  30. Prototypical Networks for Few-shot Learning. CoRR, abs/1703.05175, 2017. URL http://arxiv.org/abs/1703.05175.
  31. DeepGeo: Photo Localization with Deep Neural Network, 2018. URL https://arxiv.org/abs/1810.03077.
  32. Interpretable Semantic Photo Geolocation, 2021.
  33. CrossLocate: Cross-Modal Large-Scale Visual Geo-Localization in Natural Environments Using Rendered Modalities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.  3174–3183, January 2022.
  34. User-Driven Geolocation of Untagged Desert Imagery Using Digital Elevation Models. In 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.  237–244, 2013. doi:10.1109/CVPRW.2013.42.
  35. Attention Is All You Need, 2017.
  36. Revisiting IM2GPS in the Deep Learning Era, 2017.
  37. PlaNet - Photo Geolocation with Convolutional Neural Networks. In European Conference on Computer Vision (ECCV), 2016.
  38. Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval, 2020. URL https://arxiv.org/abs/2004.01804.
  39. IM2City: Image Geo-Localization via Multi-Modal Learning. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI ’22, pp.  50–61, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450395328. doi:10.1145/3557918.3565868. URL https://doi.org/10.1145/3557918.3565868.
  40. Cross-view Geo-localization with Layer-to-Layer Transformer. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  29009–29020. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/f31b20466ae89669f9741e047487eb37-Paper.pdf.
  41. Accurate Image Localization Based on Google Maps Street View. In Daniilidis, K., Maragos, P., and Paragios, N. (eds.), Computer Vision – ECCV 2010, pp.  255–268, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. ISBN 978-3-642-15561-1.
  42. Image Geo-Localization Based on Multiple Nearest Neighbor Feature Matching UsingGeneralized Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8):1546–1558, 2014. doi:10.1109/TPAMI.2014.2299799.
  43. TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization, 2022.
Citations (7)

Summary

  • The paper introduces a novel approach to global image geolocalization that employs semantic geocell formation and multi-task contrastive pretraining to enhance precision.
  • It achieves over 40% accuracy within 25 km and a median error distance of 44.35 km on challenging datasets, outperforming human players and previous models.
  • The study offers practical methodologies for hierarchical inference and encourages future exploration into real-time adaptation for improved autonomous localization.

An Overview of "PIGEON: Predicting Image Geolocations"

The paper "PIGEON: Predicting Image Geolocations" addresses the challenging task of image geolocalization on a global scale, leveraging advancements in computer vision through a combination of innovative techniques. The research introduces two models, PIGEON and PIGEOTTO, designed for different scenarios: high-precision street-level geolocalization and generalized image geolocalization, respectively.

Technical Contributions

The paper's primary contributions include the development of a novel geolocalization system that incorporates semantic geocell creation, multi-task contrastive pretraining, and hierarchical location refinement. The models use OpenAI’s CLIP for generating image embeddings, which then undergo a novel training protocol involving:

  • Semantic Geocells: Unlike traditional geocells, these leverage hierarchical and semantic information based on administrative boundaries and are further refined using clustering techniques like OPTICS, coupled with Voronoi tessellation, creating meaningfully-distributed classes.
  • Multi-task Contrastive Pretraining: The authors pretrain the CLIP model using geographically-informative synthetic captions, a strategy intended to bolster the model's understanding of location-specific cues by simultaneously learning additional contextual information such as climate, elevation, and population density.
  • Hierarchical Inference and Refinement: They introduce a methodology for guess refinement that employs retrieval based on location clusters, which improves accuracy by minimizing embedding distances across geocell boundaries.

Evaluation and Results

The evaluation of PIGEON, conducted using a dataset derived from the Geoguessr game, demonstrated its ability to outperform professional human players, attaining more than 40% accuracy within 25 kilometers of the true location. This attains a median error distance of 44.35 kilometers on their holdout dataset. PIGEOTTO, in contrast, was validated against several benchmark datasets and surpassed prior state-of-the-art performances, achieving a reduction in the median distance error of 2-5 times on tasks involving general images. Notably, PIGEOTTO exhibited robust generalization to previously unseen locations.

Implications and Challenges

This research carries significant implications for the field of geolocalization, pushing the envelope on how models can achieve planet-scale localization accuracy. The methodologies could influence future work in terms of structuring geocells and the potential for multi-task learning with deep multimodal models. However, practical deployment of such systems needs careful ethical considerations, particularly concerning privacy and dual-use concerns.

Future Directions

The paper invites further exploration, particularly in expanding the contextual cues that models could learn about various environments, potentially including real-time adaptation to changing geographic features. Additionally, exploring transfer learning capabilities between related tasks could further enhance model robustness and contribute to more nuanced applications, such as improving autonomous vehicle localization systems.

In summary, the paper offers a substantive advancement in the domain of image geolocalization, backed by thorough empirical validation and careful consideration of both technical essentials and future ethical ramifications.

Youtube Logo Streamline Icon: https://streamlinehq.com