Generality of reasoning‑trajectory geometry beyond studied benchmarks
Determine whether large language models exhibit similar step-indexed reasoning‑trajectory geometric organization—i.e., structured hidden‑state regions that reflect reasoning progress—when applied to other tasks such as open‑ended reasoning, multi‑hop question answering, and program synthesis, beyond the GSM8K, MATH‑500, and MMLU settings in which this structure was observed.
References
First, although we observe clear and consistent trajectory structure in GSM8K, MATH-500, and MMLU, it remains an open question whether similar geometric organization arises in other settings, such as open-ended reasoning, multi-hop QA, or program synthesis.
— LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals
(2604.05655 - Sun et al., 7 Apr 2026) in Limitations, paragraph 1