Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models (2505.17225v1)

Published 22 May 2025 in cs.AI

Abstract: LLMs have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term \textit{reasoning rigidity}. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, \dataset{}. Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and MATH500, as well as well-known puzzles deliberately redesigned to require deviation from familiar reasoning strategies. Using this dataset, we identify recurring contamination patterns that occur when models default to ingrained reasoning. Specifically, we categorize this contamination into three distinctive modes: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention, each causing models to ignore or distort provided instructions. We publicly release our diagnostic set to facilitate future research on mitigating reasoning rigidity in LLMs.

Summary

Diagnosis of Reasoning Rigidity in LLMs

The paper, "Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models," presents a comprehensive paper on a burgeoning concern in the field of AI LLMs known as "reasoning rigidity." This phenomenon is characterized by LLMs' (LLMs) tendencies to prioritize habitual reasoning patterns over specific instructions provided by users, often leading to erroneous conclusions, particularly in domains necessitating precision such as mathematics and logic puzzles.

Core Contributions

The authors introduce a well-structured diagnostic dataset, designed to address reasoning rigidity, which further allows a granular investigation into this largely unexplored area. This dataset includes modified mathematical benchmarks, specifically variants of the AIME and MATH500, along with classic puzzles redesigned to require deviation from ingrained reasoning strategies. The primary contributions of the paper are:

Identification of Reasoning Rigidity: The paper coins reasoning rigidity as a detrimental behavior distinct from hallucination or prompt brittleness, where models disregard explicit conditions in favor of their learned patterns.
Release of Diagnostic Set: A publicly available dataset aimed at facilitating further research into mitigating reasoning rigidity in LLMs.
Categorization of Contamination Modes: Identification of three distinctive modes through which reasoning rigidity manifests: Interpretation Overload, Input Distrust, and Partial Instruction Attention.
Quantitative Analysis: Introduction of the Contamination Ratio for quantitatively assessing the degree of reasoning contamination in models when exposed to familiar yet irrelevant reasoning paths.

Key Findings

The findings from the analysis using this novel dataset are particularly insightful:

While low contamination ratios do not considerably impact the final output, exceeding a threshold (approximately 40%) significantly reduces the accuracy ({1}) of the models, indicating a shift into incorrect reasoning paths.
Advanced reasoning models exhibit progressively worsening contamination as the length and complexity of reasoning increase, unlike base models which maintain more stable reasoning paths.

Implications

The implications of these findings are both theoretical and practical. Theoretically, the paper challenges the current understanding of failure modes in LLMs and underscores the necessity of developing models that can break free from fixed reasoning patterns. Practically, these results suggest that existing reinforcement learning methodologies, which have prioritized the strengthening of long-chain reasoning capabilities, may inadvertently contribute to reasoning rigidity. This, in turn, has significant ramifications for deploying LLMs in domains where adhering to user-specified constraints is non-negotiable, such as scientific computation and legal reasoning.

Future Directions

The paper opens several avenues for future research. One immediate area of exploration is the refinement of reinforcement learning strategies to minimize the impact of reasoning rigidity. Another potential direction is the adaptation and expansion of the diagnostic datasets to encompass a wider array of problem domains, thus broadening the applicability of the insights garnered from this paper.

Furthermore, detailed investigation into the underlying cognitive biases that contribute to reasoning rigidity, particularly within varied linguistic and cultural contexts, could yield valuable insights into more adaptable model architectures. Lastly, exploring interactive training paradigms that can dynamically alter reasoning paths based on real-time feedback could present revolutionary advancements in overcoming the stubbornness of reasoning models.

In conclusion, the paper presents a significant contribution to the field of AI and NLP by shedding light on a critical limitation of current-generation NLP models, proposing methodological tools for further investigation, and setting the stage for advancements towards more versatile and instruction-compliant AI systems.

Related Papers

Find Related Papers

Tweets

https://twitter.com/_akhaliq/status/1927012579106242903

https://twitter.com/HuggingPapers/status/1928785700352323591

https://twitter.com/GptMaestro/status/1938878958118052085

YouTube

Show All Videos