Relationship between reasoning trajectory length and increased inference-time compute

Determine the functional relationship between the length of chain-of-thought reasoning trajectories and the effects of increased inference-time computational capacity in large language models, including whether longer chains consistently improve performance or can instead lead to degradation, and characterize the conditions under which each outcome occurs.

Background

The paper discusses conflicting evidence in the literature regarding how the length of chain-of-thought (CoT) trajectories interacts with test-time compute scaling. Some works report that longer chains and increased inference-time computation yield gains, while others find shorter chains to be more effective or that extending reasoning (e.g., with “wait” tokens) degrades performance.

This uncertainty motivates the need for a precise, quantifiable understanding of the relationship between CoT length and test-time computational capacity. The authors’ empirical results show weak correlation between length and generalization performance compared to intrinsic dimensionality, underscoring that length alone is not a reliable predictor and highlighting the unresolved nature of the underlying relationship.

References

For example, the relationship between the length of reasoning trajectories and the subsequent increased inference-time computational capacity remains unclear; while some works find clear gains (Muennighoff et al., 2025; Li et al., 2025), other work reports that shorter chains can be more effective and that continuing to extend reasoning (e.g., via "wait" tokens) can yield degradation in performance (Wu et al., 2025; Marjanović et al., 2025).

Effective Reasoning Chains Reduce Intrinsic Dimensionality  (2602.09276 - Prasad et al., 9 Feb 2026) in Section 1 (Introduction)