AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time (2505.24863v1)

Published 30 May 2025 in cs.CL

Abstract: This paper presents AlphaOne ($\alpha$1), a universal framework for modulating reasoning progress in large reasoning models (LRMs) at test time. $\alpha$1 first introduces $\alpha$ moment, which represents the scaled thinking phase with a universal parameter $\alpha$. Within this scaled pre-$\alpha$ moment phase, it dynamically schedules slow thinking transitions by modeling the insertion of reasoning transition tokens as a Bernoulli stochastic process. After the $\alpha$ moment, $\alpha$1 deterministically terminates slow thinking with the end-of-thinking token, thereby fostering fast reasoning and efficient answer generation. This approach unifies and generalizes existing monotonic scaling methods by enabling flexible and dense slow-to-fast reasoning modulation. Extensive empirical studies on various challenging benchmarks across mathematical, coding, and scientific domains demonstrate $\alpha$1's superior reasoning capability and efficiency. Project page: https://alphaone-project.github.io/

Summary

Overview of Reasoning Models: Thinking Slow and Fast at Test Time

This paper, titled "Reasoning Models Thinking Slow and Fast at Test Time," introduces a novel framework named $\alpha$ 1 for modulating reasoning processes in large reasoning models (LRMs) such as OpenAI o1 and DeepSeek-R1 at test time. This framework focuses on enhancing the reasoning capabilities of LRMs by balancing slow and fast thinking in response to problem-solving challenges.

Key Contributions

The primary contribution of the paper is the introduction of the $\alpha$ 1 framework, designed to modulate reasoning progress during the test phase of LRMs. The research proposes an $\alpha$ moment, characterized by the scaled thinking phase controlled by the parameter $\alpha$ . The scaled pre- $\alpha$ moment phase manages slow thinking transitions using a Bernoulli stochastic process, leading to dynamic scheduling of reasoning tokens. Following the $\alpha$ moment, slow thinking is systematically terminated, enabling rapid reasoning and efficient answer generation.

The framework unifies existing scaling methods by facilitating flexible and dense modulation from slow to fast reasoning. Empirical analyses using various benchmarks in mathematical, coding, and scientific domains highlight significant enhancements in reasoning efficiency and effectiveness due to this approach.

Numerical Results

Experimental evaluations of three different LRMs, ranging from 1.5B to 32B parameters, demonstrate the framework’s effectiveness across multiple reasoning benchmarks. Notably, the application of $\alpha$ 1 results in substantial improvements in Pass@1 performance, with average accuracy boosts exceeding +6% for certain models compared to baseline methods. Specifically, the $\alpha$ 1 framework reduces the overall token generation length, indicating increased efficiency in the reasoning process.

Analytical Insights

The paper presents several key findings:

Slow First, Fast Later: Contrary to human reasoning patterns where fast thinking precedes slower, in LRMs, adopting a slow-thinking-first approach followed by rapid reasoning yields better reasoning outcomes.
Efficient Test-Time Scaling: While slow thinking traditionally prolongs reasoning, the token length is minimized with $\alpha$ 1, promoting more informative reasoning chains.
High Frequency of Slow Thinking Transitions: Frequent slow thinking transitions, facilitated by appending “wait” tokens, produce superior results, surpassing conventional scaling methods like s1.

Implications and Future Directions

The findings emphasize the necessity for tailored test-time scaling strategies for LRMs. By optimizing the transition between slow and fast thinking, LRMs can achieve reasoning capabilities closer to human-like System-2 reasoning. This has implications for advancing AI technologies in fields requiring complex cognitive processing.

The paper suggests potential future research directions, including multimodal reasoning with LRMs and transitioning-token-agnostic modulation to further generalize the benefits observed. Additionally, exploring mechanisms for dynamic reasoning progress modulation during the training phase may enhance the models' adaptability and efficiency.

Conclusion

This paper marks a significant advancement in understanding and improving LRMs' cognitive capabilities through strategic reasoning modulation. By effectively managing the transitions between slow and fast thought processes during test phases, the introduced $\alpha$ 1 framework demonstrates the potential to elevate the reasoning capabilities of LRMs significantly, aligning them more closely with human-like cognitive functions.