CoT-Valve: Tunable Chain-of-Thought Compression

Updated 16 December 2025

The paper introduces CoT-Valve, a method that tunes the chain-of-thought length by manipulating a single parameter direction in model space.
It employs a lightweight LoRA branch to adjust reasoning granularity continuously through interpolation (shortening) and extrapolation (distillation) without extra classifiers.
Experimental benchmarks show that CoT-Valve reduces token usage by up to 70% while maintaining high accuracy, offering enhanced computational efficiency.

CoT-Valve is a parameter-space tuning and inference strategy enabling LLMs to generate chain-of-thought (CoT) reasoning paths of controlled length, with fine-grained tradeoff between inference cost and solution accuracy. It introduces a single direction in model parameter space that, when manipulated, elastically compresses or expands the model’s reasoning chain during inference, with applications to arithmetic and mathematical reasoning benchmarks. The mechanism operates via a lightweight LoRA branch, making it possible to modulate reasoning granularity in one fine-tuned model without prompt engineering, auxiliary classifiers, or multi-model ensembles (Ma et al., 13 Feb 2025). The following sections provide a detailed exposition of conceptual foundation, direction identification, dataset construction, quantitative benchmarking, comparative ablation, and identified limitations.

1. Definition and Formal Objectives

CoT-Valve is constructed around a central objective: dynamic control of CoT length via a single model parameter update. Let $\theta$ denote the original model weights trained for CoT reasoning. Given an input $q$ , the model generates a reasoning chain of tokens $\{t_1,\dots,t_n\}$ and a final answer $a$ governed by the joint probability:

$p(a, t_1,\dots,t_n \mid q; \theta) = p(a \mid t_1,\dots,t_n, q;\theta) \cdot \prod_{i} p(t_i \mid t_{<i}, q;\theta)$

CoT-Valve identifies a direction $\Delta\theta$ such that applying $\theta' = \theta + \Delta\theta$ systematically yields shorter chains $\{t_1,\dots,t_m\}$ with $m<n$ , aiming to preserve ground-truth accuracy. At inference, model parameters are set to

$\theta(\alpha) = \theta + \alpha \Delta\theta,\quad \alpha\in \mathbb{R}^+,$

where varying $\alpha$ interpolates between longer and shorter reasoning chains. For $\alpha\in(0,1]$ , chains are gradually compressed; for $\alpha>1$ , extrapolation produces ultra-short, distilled chains.

2. Identification and Manipulation of the Control Direction

The parameter update direction $\Delta\theta$ is obtained by fine-tuning on pairs of reasoning chains of different lengths for the same $(q, a)$ pair. Specifically, the optimization seeks:

$\max_{\Delta\theta} \mathbb{E}_{(q,a)} \bigg[ p(a \mid t_1,\dots,t_m, q; \theta+\Delta\theta) \cdot \prod_{i=1}^m p(t_i \mid t_{<i}, q; \theta+\Delta\theta) \bigg]$

with $m<n$ for shortened CoTs. Practically, $\Delta\theta$ is implemented via a low-rank adaptation (LoRA) branch inserted into selected linear layers. At inference, scaling $\Delta\theta$ by $\alpha$ is operationally equivalent to scaling the LoRA branch, facilitating real-time control without any change to prompts or tokenization routines.

Interpolation ( $0<\alpha<1$ ) yields reasoning chains of intermediate length; extrapolation ( $\alpha>1$ ) further distills reasoning. This operation is continuous and model-native, requiring no downstream classifiers.

3. Length-Compressible Tuning and Progressive Compression Procedures

There are two principal variants extending CoT-Valve's compression mechanism:

CoT-Valve++ (Precise Length-Compressible Tuning): Given a MixChain dataset $\mathcal{D}' = \{(q,a,t_1,\dots,t_m,\beta)\}$ $D^{'} = {(q, a, t_{1}, \dots, t_{m}, β)}$ , initialize a LoRA branch $\Delta\theta'$ $Δ θ^{'}$ and iteratively update via
1 2 3
θ̂ ← θ + β·Δθ′ L ← − [ log p(a|t₁…tₘ,q;θ̂) + ∑_{i=1}^m log p(t_i|t_{<i},q;θ̂) ] backpropagate ∇_{Δθ′} L
with $\beta$ $β$ encoding normalized chain length ( $\beta\in[0,1]$ $β \in [0, 1]$ ).
CoT-Valve+P (Progressive Chain Length Compression): Sort MixChain levels $S = [L_0,...,L_K]$ by descending chain length. Initialize $\Delta\theta$ coarsely, then finetune sequentially across $\{L_k\}$ using standard objectives, yielding gradually more compressed reasoning.

MixChain datasets are constructed either via human annotation and interpolation (MixChain-C, “cold start”) or by zero-shot interpolation across model checkpoints (MixChain-Z) using $\Delta=\theta_2-\theta_1$ and varying $\alpha$ to produce multiple chain-length variants.

4. Experimental Results and Quantitative Benchmarks

CoT-Valve and its improved variants have been empirically validated on leading mathematical reasoning benchmarks:

Method	Acc (%)	#Tokens	ACU ( $\times 10^2$ )
Original QwQ-32B	95.07	741.1	0.40
Prompt-based control	93.6	355.5	0.82
CoT-Valve (Ground-Truth)	94.0	352.8	0.83
CoT-Valve++ (MixChain-C)	94.4	276.3	1.07
CoT-Valve+P (MixChain-Z)	94.9	225.5	1.32

For QwQ-32B on AIME24:

Method	Acc/30	#Tokens	ACU
Original QwQ-32B	14/30	6827.3	0.021
Prompt-based control	13/30	6102.5	0.022
CoT-Valve+P (MixChain-Z)	13/30	4629.6	0.029

CoT-Valve+P achieves a reduction in chain length by approximately 60–70% with less than 0.2% absolute accuracy drop. The accuracy per computation unit (ACU) metric, defined as

$\text{ACU} = \frac{\text{Accuracy}}{\text{Model Params} \times \text{Token Length}},$

demonstrates marked computational efficiency improvements over prompt-based approaches.

5. Comparison with Prompt-Based Control and Ablation Findings

Prompt-control approaches (“Generate solution in $<N$ tokens”) frequently fail to produce the desired shorter CoTs, with models often exceeding token budgets significantly—a request for $<20$ tokens can routinely yield $>350$ tokens. CoT-Valve, by direct scaling of $\Delta\theta$ , achieves smooth control of CoT lengths and accurate trade-offs: as few as 133 tokens can be generated with 87.5% accuracy on QwQ versus prompt-based control’s 355 tokens.

Progressive compression schedules in CoT-Valve+P outperform direct supervised fine-tuning on shortest chains, with gradual schedules maintaining higher accuracy for comparable or smaller token counts.

6. Limitations and Prospective Work

CoT-Valve embeds control in a single learned direction $\Delta\theta$ in parameter space, which may limit compressibility for diverse tasks where multiple task-specific directions may be optimal. Current mechanisms provide uniform shortening across the chain; segment-wise and context-dependent compression has not been realized. Extreme extrapolation ( $\alpha \gg 1$ ) risks omitting essential reasoning steps, yielding under-explained answers on occasion.

Optimal scheduling of the compression parameter $\alpha$ for each query—potentially using difficulty estimators—remains an open engineering challenge to balance cost against reliability. Further research may address multi-directional control, finer granularity in chain compression, and adaptive inference pipelines (Ma et al., 13 Feb 2025).

7. Practical Implications and Significance

CoT-Valve establishes a lightweight, model-native framework for elastic reasoning cost management in LLMs. It provides single-model, continuous modulation over the reasoning path’s verbosity and granularity, without reliance on token-level prompt constraints or retraining bespoke models per task length. This property is leveraged to compress reasoning chains in the QwQ-32B-Preview model on GSM8K by over 500 tokens—while maintaining 94.92% accuracy—and on AIME with a single additional error out of 30. These computational gains suggest scalable applicability for resource-constrained or latency-sensitive deployments.

A plausible implication is that parameter-space valves of this form may be generalized for a broader class of generative control problems in neural reasoning systems, enabling post-training adaptation of solution granularity across diverse downstream applications.

Markdown Report Issue Upgrade to Chat

References (1)

CoT-Valve: Length-Compressible Chain-of-Thought Tuning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CoT-Valve.