Background and Objectives
LLMs have demonstrated impressive abilities in extrapolating patterns and serving as tools for cross-disciplinary applications. Despite these capabilities, their proficiency in more abstract areas, such as spatial reasoning, is less well-understood. This paper aims to assess the performance of LLMs, specifically ChatGPT versions 3.5 and 4, and Llama 2 7B, in tasks requiring spatial understanding. These tasks involve labeling 2D paths and identifying shapes, as well as labeling 3D robotic trajectories.
Approach and Methodology
To investigate these capabilities, the paper generates datasets for 2D path and shape labeling, using simple directional instructions and shapes like circles. For 3D trajectory labeling, it employs the CALVIN baseline, which contains data on robotic movements. The researchers evaluate the models using zero-shot prompting, In-context Learning (ICL), Chain-of-Thought (CoT) prompting, and propose a new method, Spatial Prefix-Prompting (SPP), which introduces a related spatial problem before the primary query. The paper examines not only how LLMs perform with simple spatial patterns but also the transfer of knowledge from simpler tasks to more complex ones.
Results and Findings
The experiments reveal that LLMs are competent at identifying simple 2D spatial patterns and yield acceptable few-shot identification of directions, especially with ChatGPT-4, which reaches perfect classification rates on short trajectories. However, the performance drops significantly when dealing with more complex 3D trajectories, with even the best models achieving only 80% accuracy after employing SPP on the "cleaned" CALVIN dataset where noise is reduced. CoT prompting showed inconsistent performance and did not always yield improvements, suggesting it may not be as effective for spatial tasks compared to language or mathematical reasoning.
Implications and Future Directions
The Spatial Prefix-Prompting method showed promise, often outperforming other techniques, which indicates that prompting models with simpler, related problems can facilitate better performance on complex spatial tasks. This paper lays the groundwork for future research into enhancing the spatial reasoning abilities of LLMs. Potential applications could extend to areas such as trend analysis or time-series data interpretation. Going forward, the research could benefit from a larger dataset and exploring additional spatial tasks including 3D point-cloud analysis and multi-variable trend forecasting.