LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild (2405.20363v1)

Published 30 May 2024 in cs.CV

Abstract: Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal LLMs, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images from various countries via Google Street View. Then, we conduct training-free and training-based evaluations on closed-source and open-source multi-modal LLMs. we conduct both training-free and training-based evaluations on closed-source and open-source multimodal LLMs. Our findings indicate that closed-source models demonstrate superior geolocation abilities, while open-source models can achieve comparable performance through fine-tuning.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

Authors (6)

Zhiqiang Wang (107 papers)
Dejia Xu (37 papers)
Rana Muhammad Shahroz Khan (7 papers)
Yanbin Lin (6 papers)
Zhiwen Fan (52 papers)
Xingquan Zhu (36 papers)

Citations (1)

View on Semantic Scholar

LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild (2405.20363v1)

Related Papers