Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models (2512.15089v1)

Published 17 Dec 2025 in cs.AI

Abstract: LLMs have demonstrated impressive performance across various language tasks. However, existing LLM reasoning strategies mainly rely on the LLM itself with fast or slow mode (like o1 thinking) and thus struggle to balance reasoning efficiency and accuracy across queries of varying difficulties. In this paper, we propose Cognitive-Inspired Elastic Reasoning (CogER), a framework inspired by human hierarchical reasoning that dynamically selects the most suitable reasoning strategy for each query. Specifically, CogER first assesses the complexity of incoming queries and assigns them to one of several predefined levels, each corresponding to a tailored processing strategy, thereby addressing the challenge of unobservable query difficulty. To achieve automatic strategy selection, we model the process as a Markov Decision Process and train a CogER-Agent using reinforcement learning. The agent is guided by a reward function that balances solution quality and computational cost, ensuring resource-efficient reasoning. Moreover, for queries requiring external tools, we introduce Cognitive Tool-Assisted Reasoning, which enables the LLM to autonomously invoke external tools within its chain-of-thought. Extensive experiments demonstrate that CogER outperforms state-of-the-art Test-Time scaling methods, achieving at least a 13% relative improvement in average exact match on In-Domain tasks and an 8% relative gain on Out-of-Domain tasks.

Summary

The paper introduces an adaptive CogER framework that stratifies queries into four complexity levels for tailored reasoning.
The framework models strategy allocation as an MDP and uses Group Relative Policy Optimization to balance accuracy with efficient compute usage.
The integration of Cognitive Tool-Assisted Reasoning (CoTool) enables selective external tool use, achieving notable EM gains across benchmarks.

Cognitive-Inspired Elastic Reasoning for LLMs

Motivation and Problem Formulation

The computational cost of LLM inference grows substantially with model scale and context length, yet end-user queries span a broad complexity distribution. Traditional approaches employ uniform reasoning pathways for all inputs, ignoring the efficiency-accuracy trade-off presented by trivial versus demanding queries. The predominant paradigm, inspired by dual-process theory, frames LLM reasoning as “fast” (direct answers) or “slow” (full chain-of-thought) modes, but lacks dynamic adaptation to unobservable query difficulty, resulting in wasted resources or inadequate reasoning depth.

The paper introduces Cognitive-Inspired Elastic Reasoning (CogER), a framework for per-query adaptive routing informed by cognitive science. CogER decomposes queries into four complexity levels ( $L_1$ – $L_4$ ), each associated with tailored inference strategies: direct answer, light reasoning, multi-step reasoning, or tool-augmented reasoning. This stratification, reminiscent of Bloom’s Taxonomy, enables cost-optimized allocation of reasoning resources and addresses the challenges of difficulty estimation and dynamic compute scaling.

Figure 1: (a) Query complexity levels. (b) Overview of the CogER. Given an input query, the CogER-Agent selects a complexity level ( $L_1 - L_4$ ) and routes it to the corresponding reasoning strategy, including direct answering, light to multi-step reasoning, and Cognitive Tool-Assisted Reasoning. The CogER-Agent is trained via GRPO with a composite reward that combines Format Reward $\bm{\mathcal{R}}$ .

CogER: Architecture and Training

Query Complexity Assessment. Incoming queries are classified by a policy network (CogER-Agent) into four complexity levels, with each level mapped to a reasoning strategy: immediate output ( $L_1$ , No Think), chain-of-thought using a moderate LLM ( $L_2$ , Think), extended chain-of-thought via a LRM ( $L_3$ , Extend), and tool-augmented inference ( $L_4$ , Delegate with CoTool). The classification draws on heuristics inspired by educational psychology, operationalized in prompts and explicit tags for each query.

Markov Decision Process Formulation. The dynamic allocation of reasoning strategies is modeled as an episodic MDP, where the state encodes the query, reasoning history, and predicted complexity; the action space comprises both generation and high-level reasoning strategies; and the reward balances answer accuracy, format correctness, and penalizes unnecessary compute via a hierarchical-aware term.

Reinforcement Learning and Objective. The CogER-Agent is trained with Group Relative Policy Optimization (GRPO), a PPO-style variant that employs normalized groupwise advantage for improved credit assignment over sampled outputs. The reward signal integrates format compliance, answer accuracy, and penalizes over-utilization of higher cognitive levels, explicitly guiding the policy toward minimal sufficient computation.

Cognitive Tool-Assisted Reasoning (CoTool)

For $L_4$ queries exceeding in-model knowledge or requiring verification/calculation, the system integrates an orchestrated tool-calling mechanism called CoTool. Here, the LLM autonomously decides when to issue tool queries (e.g., search API, calculator, or code execution), leverages parser-generated scaffolding, integrates tool responses, and resumes step-wise generation with enriched context. The CoTool mechanism is robust, with strict constraints on invocation count and interplay with reasoning turns to prevent tool overuse and manage resource caps.

Practically, CoTool supports arbitrary function calls for knowledge, computation, and code execution by wrapping tool queries and outputs within explicit tokens and bridging the gap between internal knowledge and external APIs.

Empirical Evaluation

CogER is benchmarked on a spectrum of in-domain (GSM8K, MATH, CommonsenseQA, MedQA) and out-of-domain (MAWPS, CollegeMath) tasks. Strong numerical outcomes are reported:

In-domain (ID): CogER achieves a 13% relative improvement in average EM versus SOTA TTS baselines and nearly 10% versus the strongest LLM baseline. Specific gains include 84.52% EM on Com-QA and 81.23% on MedQA.
Out-of-domain (OOD): CogER yields an 8% relative gain in EM, surpassing both regular LLMs and the best adaptive test-time scaling methods. Notably, on the difficult CollegeMath benchmark, the EM improvement over hierarchical LLM ReasonFlux-32B is 13.21%.

Crucially, these accuracy gains are obtained with a sharp reduction in average parameters employed, end-to-end latency (by up to 76.58% compared to DeepSeek-R1), and output length. Ablation shows that RL-driven routing outperforms both random and supervised classifier-based dispatch and that both hierarchical and format-based reward terms are critical for robust, balanced policy learning. Tool invocation is observed to be judicious: CoTool achieves a notable improvement (11.24% EM on MATH) with <6% actual tool invocation rate, evidencing efficient tool integration and minimal overhead.

Theoretical and Practical Implications

The work demonstrates that formulating reasoning resource control as an MDP and adopting complexity-aware dispatch produces dual benefits: simultaneously minimizing unnecessary computation and maximizing answer reliability for complex queries. The explicit policy learning approach sidesteps the need for brittle heuristics, instead assigning computation adaptively based on learned representations of query difficulty.

Introducing explicit cognitive stratification (beyond the traditional “fast/slow” dichotomy) lays a foundation for future AI systems capable of granular, data-driven metareasoning. The robust empirical gains across datasets and substantial compute savings position CogER as a practical solution for real-world deployment, where cost, latency, and reliability are paramount.

The successful integration of external tools points to a future in which LLMs fluidly bridge model-based reasoning with declarative access to symbolic knowledge, computation, and action, further blurring the lines between language modeling and agentic problem solving.

Limitations and Future Directions

CogER, while comprehensive in single-turn, text-only settings, is not directly evaluated for interactive or multimodal scenarios such as vision-language or dialogue contexts. The sparseness of reward signals for complex reasoning steps could inhibit policy optimization in tasks requiring deep, creative synthesis. Extending the framework with multi-objective reward learning, richer feedback (including human preference), and generalization to continuous, multi-modal task settings are compelling avenues for further research. Additionally, better alignment and avoidance of “reward hacking” in creative tasks remain important open challenges.

Conclusion

The CogER framework substantiates the efficacy of adaptive, cognitive-inspired meta-reasoning for LLMs, substantiating strong gains in efficiency and accuracy via explicit complexity-level-aware reasoning orchestration and robust RL-based policy learning. The principled use of tool integration further augments the versatility and reliability of reasoning on complex, knowledge- or computation-intensive queries. This paradigm defines a scalable pathway for large-scale AI deployment in production settings, as well as motivates further work on combining learned reasoning policies, external knowledge, and agentic action.