Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 80 tok/s
Gemini 2.5 Pro 28 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 125 tok/s Pro
Kimi K2 181 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Early Stopping Chain-of-thoughts in Large Language Models (2509.14004v1)

Published 17 Sep 2025 in cs.CL

Abstract: Reasoning LLMs have demonstrated superior capacities in solving complicated problems by generating long chain-of-thoughts (CoT), but such a lengthy CoT incurs high inference costs. In this study, we introduce ES-CoT, an inference-time method that shortens CoT generation by detecting answer convergence and stopping early with minimal performance loss. At the end of each reasoning step, we prompt the LLM to output its current final answer, denoted as a step answer. We then track the run length of consecutive identical step answers as a measure of answer convergence. Once the run length exhibits a sharp increase and exceeds a minimum threshold, the generation is terminated. We provide both empirical and theoretical support for this heuristic: step answers steadily converge to the final answer, and large run-length jumps reliably mark this convergence. Experiments on five reasoning datasets across three LLMs show that ES-CoT reduces the number of inference tokens by about 41\% on average while maintaining accuracy comparable to standard CoT. Further, ES-CoT integrates seamlessly with self-consistency prompting and remains robust across hyperparameter choices, highlighting it as a practical and effective approach for efficient reasoning.

Summary

  • The paper presents ES-CoT, a method that terminates LLM reasoning early by detecting convergence in step answers to reduce computation costs.
  • The framework leverages statistical run-jump tests to identify when answers stabilize, balancing computational efficiency with minimal performance loss.
  • Empirical results on five reasoning tasks demonstrate an average token reduction of 41% while preserving accuracy close to full chain-of-thought outputs.

Early Stopping Chain-of-thoughts in LLMs

Introduction

The paper introduces a novel inference-time method called Early Stopping Chain-of-Thougt (ES-CoT) for optimizing the inference process in reasoning LLMs. These models excel in complex problem-solving by generating extensive chain-of-thoughts (CoT), yet this prolonged reasoning incurs substantial inference costs. The paper formulates a mechanism to detect the convergence of answers during reasoning and capitalizes on this by halting the process early, thus reducing computational overhead while maintaining accuracy.

Framework and Algorithm

ES-CoT operates by observing the convergence behavior in step answers. At each step of the reasoning process, the LLM is prompted to output its current best guess of the final answer. The method involves identifying patterns in these step answers, specifically monitoring the repetition and detecting significant spikes in run lengths of identical consecutive answers. Figure 1

Figure 1: Framework of ES-CoT and the run-jump test.

The algorithm tracks these "runs" and computes the "run jumps"—a statistically significant increase in run length—indicative of answer stabilization. Once a sizeable, statistically significant jump is observed, inference terminates, and the current answer is concluded as the final output. This heuristic relies on the empirical observation that step answers gradually converge towards correctness as reasoning unfolds (Figure 2). Figure 2

Figure 2

Figure 2: Probability that step answers match the final answer, P(Xt=XT)P(X_t=X_T), over reasoning progress (t/Tt/T). The last bar is an average over the final 10% of steps, so its value is less than one.

Theoretical Justification

The paper provides both empirical and theoretical validation for the ES-CoT approach. Step answers are shown to converge progressively towards the final answer with increasing reasoning progress. The probability that the step answer matches the final answer (P(Xt=XT)P(X_t=X_T)) grows, indicating an inherent tendency towards convergence in reasoning LLMs.

The ES-CoT hinges upon two core assumptions: the final step answer distribution is heavily concentrated and exhibits high fidelity, and step answer convergence is a monotonic process. The paper offers theoretical bounds on the probability of misalignment between early-stopped and fully-reasoned answers, thereby formalizing the conditions under which early termination is viable without significant accuracy loss.

Empirical Evaluation

The paper evaluates ES-CoT on five reasoning tasks across three models. Results demonstrate an average reduction in token usage of about 41% while maintaining accuracy levels close to full CoT outputs. The framework seamlessly integrates with self-consistency prompting methods, showing enhanced performance without additional resource burden.

A robust analysis on various hyperparameters of ES-CoT corroborates its resilience, demonstrating consistency across diverse model configurations and reasoning challenges. The empirical outcomes suggest that ES-CoT not only reduces inference cost but occasionally minimizes overthinking, thus enhancing answer quality.

Robustness and Sensitivity Analysis

The implementation of ES-CoT is sensitive to the hyperparameter dmind_{min}, which governs the minimum difference between consecutive run lengths for a jump to be considered significant. The paper illustrates that as dmind_{min} increases, both accuracy and token usage rise, illustrating a trade-off between efficiency and performance. The run-jump criterion, enhanced through statistical significance tests, offers a scalable and adaptable method for managing reasoning depth and inference cost. Figure 3

Figure 3: Robustness analysis of ES-CoT regarding the hyperparameters, including the minimum difference dmind_{min}.

Conclusion

In conclusion, the ES-CoT presents a robust solution for efficient reasoning in LLMs by utilising a principled method for early termination based on observable convergence in step answers. This approach provides a balance between computational efficiency and accuracy, making LLMs more feasible for real-world applications where resource economy is critical. Future work can explore adaptive parameter tuning and extend the method to non-deterministic reasoning scenarios.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 9 likes.