Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Correctness Comparison of ChatGPT-4, Gemini, Claude-3, and Copilot for Spatial Tasks (2401.02404v4)

Published 4 Jan 2024 in cs.CY

Abstract: Generative AI including LLMs has recently gained significant interest in the geo-science community through its versatile task-solving capabilities including programming, arithmetic reasoning, generation of sample data, time-series forecasting, toponym recognition, or image classification. Most existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero-shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, i.e., ChatGPT-4, Gemini, Claude-3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hartwig H. Hochmair (2 papers)
  2. Takoda Kemp (1 paper)
  3. Levente Juhasz (3 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com