- The paper shows that LRMs plan their reasoning length before generation by encoding a linear direction in the residual stream.
- It employs Lasso regression to predict reasoning token counts with high accuracy (Spearman ρ ≈ 0.8–0.9) across various model sizes.
- Activation edits at inference allow fine control over reasoning length, improving task accuracy while reducing computational overhead.
Large reasoning models (LRMs) such as DeepSeek-R1 and QwQ dynamically decide how many “reasoning” tokens (the segment between > … </think>) they will generate. This work shows that LRMs actually plan that length before producing the first reasoning token, encode the plan in a single linear direction inside the residual stream, and can be steered at inference time by editing that direction.
Key findings
- Reasoning length is predictable from the question embeddings
- Extract the residual-stream activation at the <think> position (no answer tokens yet).
- Train a Lasso linear regressor (scikit-learn) per layer to predict the subsequent number of reasoning tokens.
- Spearman ρ ≈ 0.8–0.9 on MATH across 1.5 B→32 B parameter models; deeper layers give higher accuracy.
- A single “pre-allocation” direction controls length
- Group questions by difficulty. Compute difference-in-means vectors 𝒓ᶩ_{d←1} between the hardest and easiest bins at each layer.
- All 𝒓 vectors are almost collinear (cos ≈ 0.99) → a shared direction 𝒓ᶩ.
- Vector norm ∥𝒓ᶩ∥ grows monotonically with task difficulty and with the empirical reasoning length.
- Causal steering via activation addition
- At inference, overwrite the residual stream just after the final attention and MLP block of layer L:
1
|
h[l] = h[l] + λ * r[l] # λ∈ℝ, typically −0.2…0.2 |
- Negative λ increases the logit of </think>, shortening the chain-of-thought and dropping accuracy; positive λ decreases that logit, lengthens reasoning and often improves accuracy (~+2 pts avg. on MATH-500, AIME-2024, OlympiadBench at λ≈0.1).
- Answer-token length (after </think>) is unaffected ⇒ vector specifically modulates reasoning.
- Mechanism insight
The direction primarily shifts the logit of the </think> token, not the EOS or random tokens, explaining the selective control over termination.
Practical applications
- Overthink detection before generation
Feed the prompt, run the linear probe; unusually high predicted reasoning length flags potential “slow-down” prompts.
- Test-time efficiency for easy questions
For tasks like MMLU or level-1 MATH, injecting a small negative λ cuts reasoning tokens by ~50 % with negligible accuracy loss, saving latency and GPU time.
Implementation recipe
- Data & models
- Use any open LRM with visible activations (DeepSeek-R1-Distill-Qwen, QwQ-32B).
- Collect question–answer pairs containing explicit <think>, delimiters.
- Split 90 / 10 train/test.
- Linear probe
1
2
3
4
|
from sklearn.linear_model import Lasso
reg = Lasso(alpha=10) # prevent over-fit
reg.fit(H_train, y_train) # H: n×d activations
score = reg.score(H_test, y_test) # or compute Spearman |
- Extract direction
1
2
|
r = H_hard.mean(axis=0) - H_easy.mean(axis=0)
r /= np.linalg.norm(r) # keep unit direction |
- Steering during generation (vLLM example)
1
2
3
|
def hook(layer_out, λ=0.1):
return layer_out + λ * r
model.register_forward_hook(layer_id, hook) |
- Hyper-parameter search
Grid-search λ ∈ {−0.2, −0.15, …, 0.2}. Optimal layer is usually in the final 25 % of the stack.
Resource footprint
80 GPU-days on 8×A100 for all probing, vector extraction, and generation sweeps.
Limitations and future work
- Tested only on Qwen-based LRMs; other backbones (Llama-3, Gemma-2) need verification.
- Linear probes suffice, but non-linear probes could reveal finer-grained planning signals.
- Extreme positive steering saturates performance gains, hinting at an upstream reasoning-ability ceiling.
Take-away for practitioners
The reasoning-length behaviour of today’s LRMs is both forecastable and controllable at the activation level. A lightweight linear probe + one direction vector gives you:
- Early warning of pathological long chains-of-thought;
- A one-line hook to trade computation for accuracy at test time;
- A concrete handle for future fine-tuning or RL objectives that need explicit length targets.