Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning (2505.17813v1)

Published 23 May 2025 in cs.CL and cs.AI

Abstract: Reasoning LLMs heavily rely on scaling test-time compute to perform complex reasoning tasks by generating extensive "thinking" chains. While demonstrating impressive results, this approach incurs significant computational costs and inference time. In this work, we challenge the assumption that long thinking chains results in better reasoning capabilities. We first demonstrate that shorter reasoning chains within individual questions are significantly more likely to yield correct answers - up to 34.5% more accurate than the longest chain sampled for the same question. Based on these results, we suggest short-m@k, a novel reasoning LLM inference method. Our method executes k independent generations in parallel and halts computation once the first m thinking processes are done. The final answer is chosen using majority voting among these m chains. Basic short-1@k demonstrates similar or even superior performance over standard majority voting in low-compute settings - using up to 40% fewer thinking tokens. short-3@k, while slightly less efficient than short-1@k, consistently surpasses majority voting across all compute budgets, while still being substantially faster (up to 33% wall time reduction). Inspired by our results, we finetune an LLM using short, long, and randomly selected reasoning chains. We then observe that training on the shorter ones leads to better performance. Our findings suggest rethinking current methods of test-time compute in reasoning LLMs, emphasizing that longer "thinking" does not necessarily translate to improved performance and can, counter-intuitively, lead to degraded results.

Summary

Preferring Shorter Thinking Chains for Improved LLM Reasoning

The paper "Don't Overthink It: Preferring Shorter Thinking Chains for Improved LLM Reasoning" explores the computational efficacy and reasoning capabilities of LLMs in performing complex reasoning tasks. This research questions the prevailing assumption that longer thinking chains lead to better reasoning outcomes in LLMs. Traditionally, reasoning within LLMs has relied on extensive computations and long sequences of tokens during inference to enhance performance. However, this approach can be computationally expensive and time-consuming.

Key Findings

Shorter Chains for Better Accuracy: The research demonstrates that shorter thinking chains are significantly more likely to yield correct answers. Empirical evaluations showed that within individual questions, shorter reasoning trajectories were up to 34.5% more accurate than the longest reasoning chains generated for the same questions. This contradicts the belief that longer responses necessarily enhance reasoning capabilities.
Novel Inference Method: The authors propose a method termed short-#1@k{m}, which prioritizes shorter thinking chains in LLM reasoning. This method executes k independent generations in parallel, halting computation as soon as the first m processes are complete, and then selects the final answer using majority voting among these m chains. This approach demonstrated superior or comparable performance to standard methods but with up to 40% fewer thinking tokens utilized.
Performance and Computational Efficiency: The paper shows that short-#1@k{1} performed similarly or outperformed standard majority voting in low-compute settings, whereas short-#1@k{3} consistently surpassed majority voting across all compute budgets while running up to 33% faster in wall time reduction.
Impact of Finetuning: Inspired by the results, further finetuning was conducted on LLM using short, long, and randomly selected reasoning chains. Training LLM on shorter reasoning chains led to improved performance, reinforcing the notion that more extended reasoning does not correlate with better reasoning outcomes.

Implications

The practical implications of this research are substantial. By reducing the need for long computation sequences, it enhances efficiency and reduces computational expenses associated with LLM reasoning tasks. Theoretically, this challenges existing paradigms dependent on extensive computational efforts, advocating that less can be more in achieving robust reasoning capabilities.

Future Directions

As LLMs continue to evolve, models might increasingly benefit from optimized reasoning approaches that balance brevity with depth. Further investigation into how these findings can be integrated into various AI applications, such as real-time decision-making and conversational agents, would be beneficial. Moreover, exploring fine-tuning models with diverse datasets reflecting shorter thinking chains could open new avenues for reducing computational costs while preserving, or even enhancing, performance.

In conclusion, this paper instigates a paradigm shift in reasoning tasks, indicating that longer processes do not equate to superior reasoning, heralding the possibility of efficient AI systems operating with minimized computational burden without compromising their capability to deliver accurate and reliable results.

Related Papers

Find Related Papers

Tweets

https://twitter.com/MichaelHassid/status/1927352122082791762

https://twitter.com/fly51fly/status/1927118031588659436

https://twitter.com/_akhaliq/status/1927760370908745809

https://twitter.com/MichaelHassid/status/1927661044077326490

https://twitter.com/susumuota/status/1935127173964222791

https://twitter.com/GptMaestro/status/1939940428788228201

YouTube

Show All Videos