On Reasoning Strength Planning in Large Reasoning Models (2506.08390v1)

Published 10 Jun 2025 in cs.AI

Abstract: Recent studies empirically reveal that large reasoning models (LRMs) can automatically allocate more reasoning strengths (i.e., the number of reasoning tokens) for harder problems, exhibiting difficulty-awareness for better task performance. While this automatic reasoning strength allocation phenomenon has been widely observed, its underlying mechanism remains largely unexplored. To this end, we provide explanations for this phenomenon from the perspective of model activations. We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation, with this reasoning strength causally controlled by the magnitude of a pre-allocated directional vector. Specifically, we show that the number of reasoning tokens is predictable solely based on the question activations using linear probes, indicating that LRMs estimate the required reasoning strength in advance. We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model, where the vector's magnitude modulates the reasoning strength. Subtracting this vector can lead to reduced reasoning token number and performance, while adding this vector can lead to increased reasoning token number and even improved performance. We further reveal that this direction vector consistently yields positive reasoning length prediction, and it modifies the logits of end-of-reasoning token </think> to affect the reasoning length. Finally, we demonstrate two potential applications of our findings: overthinking behavior detection and enabling efficient reasoning on simple problems. Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors. Our code is available at https://github.com/AlphaLab-USTC/LRM-plans-CoT.

Summary

The paper shows that LRMs plan their reasoning length before generation by encoding a linear direction in the residual stream.
It employs Lasso regression to predict reasoning token counts with high accuracy (Spearman ρ ≈ 0.8–0.9) across various model sizes.
Activation edits at inference allow fine control over reasoning length, improving task accuracy while reducing computational overhead.

Large reasoning models (LRMs) such as DeepSeek-R1 and QwQ dynamically decide how many “reasoning” tokens (the segment between > … </think>) they will generate. This work shows that LRMs actually plan that length before producing the first reasoning token, encode the plan in a single linear direction inside the residual stream, and can be steered at inference time by editing that direction.

Key findings
Reasoning length is predictable from the question embeddings

Extract the residual-stream activation at the <think> position (no answer tokens yet).

Train a Lasso linear regressor (scikit-learn) per layer to predict the subsequent number of reasoning tokens.

Spearman ρ ≈ 0.8–0.9 on MATH across 1.5 B→32 B parameter models; deeper layers give higher accuracy.

A single “pre-allocation” direction controls length

Group questions by difficulty. Compute difference-in-means vectors 𝒓ᶩ_{d←1} between the hardest and easiest bins at each layer.

All 𝒓 vectors are almost collinear (cos ≈ 0.99) → a shared direction 𝒓ᶩ.

Vector norm ∥𝒓ᶩ∥ grows monotonically with task difficulty and with the empirical reasoning length.
Causal steering via activation addition
At inference, overwrite the residual stream just after the final attention and MLP block of layer L:
1
h[l] = h[l] + λ * r[l]          # λ∈ℝ, typically −0.2…0.2
Negative λ increases the logit of </think>, shortening the chain-of-thought and dropping accuracy; positive λ decreases that logit, lengthens reasoning and often improves accuracy (~+2 pts avg. on MATH-500, AIME-2024, OlympiadBench at λ≈0.1).

Answer-token length (after </think>) is unaffected ⇒ vector specifically modulates reasoning.
Mechanism insight The direction primarily shifts the logit of the </think> token, not the EOS or random tokens, explaining the selective control over termination.
Practical applications

Overthink detection before generation Feed the prompt, run the linear probe; unusually high predicted reasoning length flags potential “slow-down” prompts.

Test-time efficiency for easy questions For tasks like MMLU or level-1 MATH, injecting a small negative λ cuts reasoning tokens by ~50 % with negligible accuracy loss, saving latency and GPU time.

Implementation recipe

Data & models

Use any open LRM with visible activations (DeepSeek-R1-Distill-Qwen, QwQ-32B).

Collect question–answer pairs containing explicit <think>, delimiters.

Split 90 / 10 train/test.

Linear probe

from sklearn.linear_model import Lasso
reg = Lasso(alpha=10)                 # prevent over-fit
reg.fit(H_train, y_train)             # H: n×d activations
score = reg.score(H_test, y_test)     # or compute Spearman

Extract direction

1 2	r = H_hard.mean(axis=0) - H_easy.mean(axis=0) r /= np.linalg.norm(r) # keep unit direction

Steering during generation (vLLM example)

1
2
3

def hook(layer_out, λ=0.1):
    return layer_out + λ * r
model.register_forward_hook(layer_id, hook)

Hyper-parameter search Grid-search λ ∈ {−0.2, −0.15, …, 0.2}. Optimal layer is usually in the final 25 % of the stack.

Resource footprint 80 GPU-days on 8×A100 for all probing, vector extraction, and generation sweeps.

Limitations and future work

Tested only on Qwen-based LRMs; other backbones (Llama-3, Gemma-2) need verification.
Linear probes suffice, but non-linear probes could reveal finer-grained planning signals.
Extreme positive steering saturates performance gains, hinting at an upstream reasoning-ability ceiling.

Take-away for practitioners

The reasoning-length behaviour of today’s LRMs is both forecastable and controllable at the activation level. A lightweight linear probe + one direction vector gives you:

Early warning of pathological long chains-of-thought;
A one-line hook to trade computation for accuracy at test time;
A concrete handle for future fine-tuning or RL objectives that need explicit length targets.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - AlphaLab-USTC/LRM-plans-CoT: The implementation of "On Reasoning Strength Planning in Large Reasoning Models" (6 stars)

Tweets

YouTube

Show All Videos