- The paper introduces Fractional Reasoning, a training-free framework that uses latent steering vectors to dynamically adjust inference-time reasoning intensity, boosting overall performance.
- It demonstrates how varying the scaling factor enables fine-grained control over reasoning, significantly enhancing results in majority voting and Best-of-N strategies.
- Experimental results on benchmarks like GSM8K and MATH500 reveal notable accuracy improvements over standard inference methods with minimal extra computation.
This paper introduces "Fractional Reasoning (FR)," a training-free and model-agnostic framework designed to improve the test-time computational efficiency and performance of LLMs. The core problem FR addresses is that existing test-time compute strategies, like Best-of-N, majority voting, and self-reflection, typically apply a uniform level of reasoning intensity across all inputs, regardless of individual problem complexity. This can lead to under-thinking for complex problems or over-thinking and unnecessary computation for simpler ones.
Fractional Reasoning enables continuous control over reasoning intensity at inference time. It operates by first extracting a "latent steering vector" that represents the directional shift in the model's internal representations induced by reasoning-promoting prompts (e.g., "Think step by step" for Chain-of-Thought, or reflection instructions). This vector is derived by contrasting the latent states produced by positive (reasoning-promoting) and negative (direct-answering) prompts on a set of queries. Specifically, the steering vector hsteer is the first principal direction of the differences between the latent representations of positive and negative examples:
hsteer:=arghmaxm1i=1∑m(h⊤(h(Xipos)−h(Xineg)))2s.t. h⊤h=1
Once extracted, this steering vector is reapplied to the latent states ht of the query tokens (without the explicit instructional prompt) with a tunable scaling factor α:
h^t:=ht+α⋅hsteer
The resulting steered latent states h^t are then rescaled to maintain norm stability across layers using h~t=h^t⋅∥h^t∥∥ht∥. This allows the model to modulate its reasoning depth or reflection strength dynamically without altering input text or requiring fine-tuning.
The framework supports two key modes of test-time scaling:
- Breadth-based scaling (e.g., Best-of-N, Majority vote): By varying α, FR generates a diverse set of outputs with different reasoning intensities. This increases the chance of producing a correct answer, improving success rates with fewer overall samples compared to standard methods.
- Depth-based scaling (e.g., self-reflection): FR allows fine-grained control over the strength of reflection, enabling the model to critique and revise its outputs more appropriately, avoiding under- or over-reflection. For reflection, a slightly modified steering vector extraction is used, directly taking the latent states of the input with the reflection prompt as hsteer and using a different rescaling: h~t=1+α1(ht+αhsteer).
Experiments were conducted on mathematical reasoning benchmarks (GSM8K, MATH500) and general reasoning (GPQA) using open-source models like Qwen-2.5-7B-Instruct and LLaMA-3.1-8B-Instruct. FR was evaluated against standard test-time compute methods.
For Chain-of-Thought prompting, positive prompts like "Solve the mathematics problem with step-by-step detailed reasoning" and negative prompts like "Solve the mathematics problem with direct answering" were used to derive the steering vector. For evaluation, multiple responses were generated using different α values, and the final answer was selected via majority vote or a Best-of-N approach using an external reward model (RLHFlow/Llama3.1-8B-PRM-Deepseek-Data).
Results consistently showed that FR enhances the performance of both majority voting and Best-of-N strategies across all benchmarks and models (Table 1). For example, with Llama-3.1-8B-Instruct on GSM8K, Majority vote + FR achieved 89.5% accuracy compared to 86.9% for standard Majority vote. Similarly, Best-of-N + FR achieved 90.3% compared to 79.1% for standard Best-of-N.
In reflection tasks (Table 2), FR again outperformed standard reflection prompting. For instance, Qwen-2.5-7B-Instruct with FR achieved 61.4% on MATH500, up from 59.2% with standard reflection.
The paper also demonstrated that:
- Increasing α leads to more verbose, detailed reasoning (Figure 3), confirming controllable behavior.
- FR is effective even on models already specialized for reasoning, like DeepSeek-R1-Distill-Qwen-7B (Table 3).
- FR scales robustly with an increased number of generations, often outperforming baselines more consistently (Figure 5).
- The framework has potential for finer-grained, sentence-level control of reasoning strength, adapting α dynamically based on feedback signals like internal consistency (Figure 4).
The main contributions are:
- A general, training-free framework for adaptive reasoning control (Fractional Reasoning).
- Practical methods for extracting and applying latent steering vectors with tunable strength.
- Demonstrated effectiveness across multiple models, benchmarks, and test-time scaling strategies.
A limitation noted is that the current approach relies on predefined reasoning directions and does not yet support automatic selection of the optimal scaling factor α per instance or step, highlighting an area for future work on adaptive policies.