CogER: Cognitive-Inspired Elastic Reasoning

Updated 18 December 2025

CogER is a framework that dynamically allocates LLM reasoning effort across hierarchical inference strategies determined by query complexity.
It uses reinforcement learning to select resource-efficient strategies, ensuring a balanced trade-off between computational cost and reasoning accuracy.
Benchmark tests reveal that CogER outperforms static inference paradigms, improving both exact match accuracy and compute efficiency.

Cognitive-Inspired Elastic Reasoning (CogER) is a framework for LLMs that dynamically allocates reasoning effort by hierarchically selecting from multiple inference strategies of graded complexity. Drawing upon insights from human cognitive psychology, particularly hierarchical models of cognition such as Bloom’s Taxonomy, CogER classifies each user query according to its latent difficulty and, through reinforcement learning, routes inference to the most resource-appropriate processing strategy. This approach explicitly addresses trade-offs between computational resource use and reasoning accuracy, outperforming prior test-time scaling and static inference paradigms (Hu et al., 17 Dec 2025).

1. Theoretical Foundations and Motivation

LLMs typically default to either "fast" shallow heuristics (analogous to System 1 thinking) or "slow" compositional reasoning (System 2), without query-specific adaptation. Prior strategies leave a gap: uniform inference wastes compute on simple queries and underperforms on challenging ones. CogER reframes LLM reasoning as an elastic process, dynamically calibrating effort to the query’s cognitive demand. The design is inspired by human hierarchical reasoning, incorporating discrete tiers (mirroring Bloom’s five-level hierarchy) to operationalize cognitive demands, thereby aligning model compute expenditure with problem complexity (Hu et al., 17 Dec 2025).

2. Query Complexity Assessment and Stratification

CogER operationalizes difficulty assessment via a four-level taxonomy:

Level	Mode Description	Typical Model/Resource
$L_1$	Prompt Answering (no reasoning)	7B-Instruct; minimal CoT
$L_2$	CoT Reasoning (short chain-of-thought)	Qwen2.5-32B; brief CoT
$L_3$	Deep Reasoning (extended CoT)	QWQ-32B; extended CoT
$L_4$	Tool-Enhanced Deep Reasoning	QWQ-32B + external tool API

A lightweight, 7B-parameter CogER-Agent classifies an input $x$ into one of $L_1$ -- $L_4$ . Lower levels use minimal models and computation, while higher levels access more sophisticated reasoning or invoke external tools via the Cognitive Tool-Assisted Reasoning (CoTool) pipeline (Hu et al., 17 Dec 2025). This hierarchical stratification enables proportional allocation of compute and complexity.

3. Markov Decision Process Formulation and Strategy Selection

CogER formulates per-query strategy selection as an MDP $\langle\mathcal{S},\mathcal{A},\mathcal{T},\mathcal{R},\pi\rangle$ :

State: $s_t = [x, y_{1:t-1}, L_i]$ , encoding the query, partial output, and current complexity assignment.
Action Space: $\mathcal{A} = \{\mathrm{No\_Think}, \mathrm{Think}, \mathrm{Extend}, \mathrm{Delegate}, \mathcal{V}\}$ , i.e., coarse-grained reasoning operations and fine-grained token generation.
Transition: $\mathcal{T}(s_t, a_t)$ augments the reasoning trace.
Reward: At episode termination, the reward sums correctness (+1 if answer correct), format adherence, and a hierarchy term penalizing use of complexity above the minimum necessary level.
Objective: The agent seeks a policy $\pi$ maximizing expected discounted reward

$J(\pi) = \mathbb{E}_{\pi} \Bigl[ \sum_{t=0}^T \gamma^t \mathcal{R}(s_t, a_t)\Bigr],\quad \gamma\in[0,1].$

The reward function enforces cost-sensitive optimization, discouraging overuse of complex strategies while preserving high solution quality. In the alternative view, $\mathcal{R}(s_T, a_T) = \mathrm{accuracy}(s_T, a_T) - \lambda\, \mathrm{cost}(a_T)$ with tunable $\lambda$ (Hu et al., 17 Dec 2025).

4. Reinforcement Learning and Policy Optimization

Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO) using groupwise advantage normalization, stabilizes policy gradients during agent training. For each query, $G = 12$ candidate rollouts are scored, advantages $\hat{A}_i$ are normalized within each group, and the clipped PPO objective (with KL penalty) is optimized:

$\mathcal{J}_{\mathrm{GRPO}}(\theta) = \frac{1}{G} \sum_{i=1}^G \min\left[\tfrac{\pi_\theta(o_i|x)}{\pi_{\theta_\text{old}}(o_i|x)}\hat{A}_i,\, \mathrm{clip}\left(...\right)\hat{A}_i\right] - \beta\,D_{\mathrm{KL}}(\pi_\theta\|\pi_\text{ref})$

Training employs AdamW with learning rate $5\times10^{-5}$ , batch size $24\times3$ , LoRA rank 16, KL weight 0.1, clip parameter 0.2, and a single epoch on 8,000 examples (GSM8K, MATH, CommonsenseQA, MedQA) on 8× NVIDIA A800 GPUs using PyTorch 2.6.0. Empirically, convergence is achieved within this single epoch (Hu et al., 17 Dec 2025).

5. Cognitive Tool-Assisted Reasoning (CoTool)

For the highest complexity ( $L_4$ ) queries, the CoTool mechanism interleaves LLM inference with external API or tool calls. During chain-of-thought generation, when a tool query is triggered, the model emits special tokens ("<|begin_tool_query|> ... <|end_tool_query|>"), which are intercepted by the Reasoning Support Toolkit (RSTKit). RSTKit parses these, selects an appropriate external API (search, calculator, code execution) via a JSON selector, and returns tool results embedded as "<|begin_tool_result|> ... <|end_tool_result|>".

This mechanism is resource-conscious, enforcing caps on the number and depth of tool calls per query. The LLM integrates these tool results, enabling compositional, externally grounded reasoning particularly for problems requiring factual retrieval or non-trivial computation (Hu et al., 17 Dec 2025).

6. Inference Pipeline and Workflow

Inference proceeds as follows:

The CogER-Agent assesses the input query $x$ and assigns complexity level $L$ .
Processing dispatch:
- $L_1$ : Direct output from 7B-Instruct (No_Think).
- $L_2$ : Qwen2.5-32B with short CoT (Think).
- $L_3$ : QWQ-32B with extended CoT (Extend).
- $L_4$ : QWQ-32B with CoTool-enabled reasoning (Delegate).
If CoTool is invoked, generation interleaves tool queries and results according to the pipeline logic described above.
The final output is produced after all reasoning steps (including external tool calls, if any) complete (Hu et al., 17 Dec 2025).

7. Experimental Results and Analysis

CogER was benchmarked on in-domain (GSM8K, MATH-500, CommonsenseQA, MedQA) and out-of-domain (MAWPS, CollegeMath) datasets, with primary evaluation on exact match (EM) accuracy, compute latency, generated word counts, and model parameter footprint. Baselines included QWen2.5-Math-72B, DeepSeek-R1, DS-R1-DQ-7B/14B/32B, and Test-Time Scaling (TTS) methods (L1-MAX, S1-32B, ReasonFlux-32B).

Baseline	In-Domain EM	Out-Domain EM
DeepSeek-R1	81.55%	83.00%
S1-32B (TTS)	78.80%	81.32%
ReasonFlux-32B	68.51%	86.25%
CogER	89.28%	93.56%

CogER achieves +13.3% relative improvement (in-domain) over S1-32B and +8.4% (out-of-domain) over ReasonFlux-32B. Ablation indicates performance gains are due to both multi-tier routing and reward function design. Notably, for high-complexity mathematical queries, enabling CoTool leads to 97.00% EM on MATH-500 with a 3.03% tool invocation rate. Compute efficiency benchmarks show CogER running 4× faster than DeepSeek-R1 and generating ~50% fewer output tokens (Hu et al., 17 Dec 2025).

Qualitative case studies demonstrate the model’s ability to interleave tool-augmented reasoning: e.g., delegating polynomial computation to a calculator to prevent arithmetic errors and integrating external knowledge from wiki searches to answer open-domain commonsense queries.

8. Significance, Limitations, and Implications

CogER establishes a cognitive-inspired architecture for flexible, resource-efficient LLM reasoning. By treating strategy selection as an MDP and leveraging fine-grained reward shaping, CogER avoids both compute underutilization and overcommitment. Its modular pipeline, which includes autonomous tool invocation and chain-of-thought reasoning, generalizes beyond prior fixed-schedule test-time scaling. The empirical results demonstrate the benefit of hierarchical reasoning allocation, both for accuracy and computational agility.

A plausible implication is that future LLM deployments may increasingly adopt similar adaptive, multi-tiered inference frameworks to optimize accuracy-cost trade-offs on heterogeneous workloads. The approach’s reliance on explicit training for complexity assessment may, however, introduce domain transfer limitations if new query types deviate from the cognitive taxonomy used for policy training (Hu et al., 17 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Cognitive-Inspired Elastic Reasoning (CogER).