PIGEON: Predicting Image Geolocations (2307.05845v6)

Published 11 Jul 2023 in cs.CV and cs.LG

Abstract: Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function. Additionally, our work is the first to perform retrieval over location clusters for guess refinements. We train two models for evaluations on street-level data and general-purpose image geolocalization; the first model, PIGEON, is trained on data from the game of Geoguessr and is capable of placing over 40% of its guesses within 25 kilometers of the target location globally. We also develop a bot and deploy PIGEON in a blind experiment against humans, ranking in the top 0.01% of players. We further challenge one of the world's foremost professional Geoguessr players to a series of six matches with millions of viewers, winning all six games. Our second model, PIGEOTTO, differs in that it is trained on a dataset of images from Flickr and Wikipedia, achieving state-of-the-art results on a wide range of image geolocalization benchmarks, outperforming the previous SOTA by up to 7.7 percentage points on the city accuracy level and up to 38.8 percentage points on the country level. Our findings suggest that PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places and that our approach can pave the way for highly accurate, planet-scale image geolocalization systems. Our code is available on GitHub.

References (43)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces a novel approach to global image geolocalization that employs semantic geocell formation and multi-task contrastive pretraining to enhance precision.
It achieves over 40% accuracy within 25 km and a median error distance of 44.35 km on challenging datasets, outperforming human players and previous models.
The study offers practical methodologies for hierarchical inference and encourages future exploration into real-time adaptation for improved autonomous localization.

An Overview of "PIGEON: Predicting Image Geolocations"

The paper "PIGEON: Predicting Image Geolocations" addresses the challenging task of image geolocalization on a global scale, leveraging advancements in computer vision through a combination of innovative techniques. The research introduces two models, PIGEON and PIGEOTTO, designed for different scenarios: high-precision street-level geolocalization and generalized image geolocalization, respectively.

Technical Contributions

The paper's primary contributions include the development of a novel geolocalization system that incorporates semantic geocell creation, multi-task contrastive pretraining, and hierarchical location refinement. The models use OpenAI’s CLIP for generating image embeddings, which then undergo a novel training protocol involving:

Semantic Geocells: Unlike traditional geocells, these leverage hierarchical and semantic information based on administrative boundaries and are further refined using clustering techniques like OPTICS, coupled with Voronoi tessellation, creating meaningfully-distributed classes.
Multi-task Contrastive Pretraining: The authors pretrain the CLIP model using geographically-informative synthetic captions, a strategy intended to bolster the model's understanding of location-specific cues by simultaneously learning additional contextual information such as climate, elevation, and population density.
Hierarchical Inference and Refinement: They introduce a methodology for guess refinement that employs retrieval based on location clusters, which improves accuracy by minimizing embedding distances across geocell boundaries.

Evaluation and Results

The evaluation of PIGEON, conducted using a dataset derived from the Geoguessr game, demonstrated its ability to outperform professional human players, attaining more than 40% accuracy within 25 kilometers of the true location. This attains a median error distance of 44.35 kilometers on their holdout dataset. PIGEOTTO, in contrast, was validated against several benchmark datasets and surpassed prior state-of-the-art performances, achieving a reduction in the median distance error of 2-5 times on tasks involving general images. Notably, PIGEOTTO exhibited robust generalization to previously unseen locations.

Implications and Challenges

This research carries significant implications for the field of geolocalization, pushing the envelope on how models can achieve planet-scale localization accuracy. The methodologies could influence future work in terms of structuring geocells and the potential for multi-task learning with deep multimodal models. However, practical deployment of such systems needs careful ethical considerations, particularly concerning privacy and dual-use concerns.

Future Directions

The paper invites further exploration, particularly in expanding the contextual cues that models could learn about various environments, potentially including real-time adaptation to changing geographic features. Additionally, exploring transfer learning capabilities between related tasks could further enhance model robustness and contribute to more nuanced applications, such as improving autonomous vehicle localization systems.

In summary, the paper offers a substantive advancement in the domain of image geolocalization, backed by thorough empirical validation and careful consideration of both technical essentials and future ethical ramifications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/maccaw/status/1786815385678856665

https://twitter.com/akuwantsagi/status/1792545673323094482

https://twitter.com/clkruse/status/1745495136904339882

https://twitter.com/UnShelledSec/status/1744783586190889392

https://twitter.com/SamuelPatt/status/1896587926211068161

https://twitter.com/fiddlerlabs/status/1750564191491203490

YouTube

Show All Videos