Clarify Ambiguity in Evaluation Instructions on Move Enumeration
Determine whether Shojaee et al.’s evaluation protocol requires models to enumerate all intermediate states or only the final move list when solving large-N Tower of Hanoi instances, by clarifying the intended meaning of Section A.1.1’s instruction to “include the corresponding complete list of moves” during the “thinking process.”
References
The intent of the original instructions seems ambiguous, but we have updated this section to more closely track model behaviour in practice.
— Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
(Opus et al., 10 Jun 2025) in Section 4: Models Abbreviate Long Solutions, Causing Apparent Collapse