Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild (2405.20363v1)

Published 30 May 2024 in cs.CV

Abstract: Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal LLMs, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images from various countries via Google Street View. Then, we conduct training-free and training-based evaluations on closed-source and open-source multi-modal LLMs. we conduct both training-free and training-based evaluations on closed-source and open-source multimodal LLMs. Our findings indicate that closed-source models demonstrate superior geolocation abilities, while open-source models can achieve comparable performance through fine-tuning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhiqiang Wang (107 papers)
  2. Dejia Xu (37 papers)
  3. Rana Muhammad Shahroz Khan (7 papers)
  4. Yanbin Lin (6 papers)
  5. Zhiwen Fan (52 papers)
  6. Xingquan Zhu (36 papers)
Citations (1)