Self-Training Elicits Concise Reasoning in Large Language Models (2502.20122v3)

Published 27 Feb 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Chain-of-thought (CoT) reasoning has enabled LLMs to utilize additional computation through intermediate tokens to solve complex tasks. However, we posit that typical reasoning traces contain many redundant tokens, incurring extraneous inference costs. Upon examination of the output distribution of current LLMs, we find evidence on their latent ability to reason more concisely, relative to their default behavior. To elicit this capability, we propose simple fine-tuning methods which leverage self-generated concise reasoning paths obtained by best-of-N sampling and few-shot conditioning, in task-specific settings. Our combined method achieves a 30% reduction in output tokens on average, across five model families on GSM8K and MATH, while maintaining average accuracy. By exploiting the fundamental stochasticity and in-context learning capabilities of LLMs, our self-training approach robustly elicits concise reasoning on a wide range of models, including those with extensive post-training. Code is available at https://github.com/TergelMunkhbat/concise-reasoning

Summary

Insightful Overview of "Self-Training Elicits Concise Reasoning in LLMs"

The paper "Self-Training Elicits Concise Reasoning in LLMs" by Tergel Munkhbat et al. addresses a prominent inefficiency in the reasoning capabilities of LLMs that employ chain-of-thought (CoT) reasoning methodologies. The authors posit that the CoT approach, while enhancing the problem-solving potential of LLMs, often leads to verbose outputs that inflate computational costs without proportionate gains in accuracy. This observation forms the bedrock of their exploration into methods to stimulate more concise reasoning within LLMs through self-training.

Key Contributions and Methodologies

The research introduces a methodology leveraging simple self-training techniques that involve fine-tuning LLMs with concise reasoning paths. These paths are self-generated using methods such as best-of-N (BoN) sampling and few-shot conditioning. The methods are evaluated across several datasets and model families.

Conciseness Hypothesis: The authors initiate their paper with the hypothesis that LLMs inherently possess the capacity for more concise reasoning but are not fine-tuned to this attribute by default. Through self-training, they endeavor to align model inference behavior with this latent capacity.
BoN Sampling and Few-Shot Conditioning: To extract succinct reasoning from LLMs, the research employs BoN sampling that selectively distills concise reasoning chains and pairs this with few-shot examples that serve as conditioning inputs, significantly improving the models' conciseness in output generation without sacrificing accuracy.
Empirical Results: The authors report a notable average reduction of 30% in the number of output tokens across diverse LLM architectures on GSM8K and MATH datasets, while maintaining accuracy levels. These reductions offer substantial benefits in computational efficiency during inference, which is crucial for resource-constrained deployments.
Adaptivity to Complexity: Moreover, the analysis reveals the model's ability to adjust rationale length based on problem complexity—retaining detailed reasoning for more complex problems while streamlining simpler ones. This flexibility underscores the potential for tailoring LLMs to dynamically manage computational resources in real-time applications.

Implications and Future Directions

From a practical standpoint, the findings suggest significant improvements in the efficiency of LLMs without compromising on the quality of reasoning outputs. The paper's approach responds to the growing need for resource-effective LLM operations, underlining important considerations for model deployment in environments where computational overhead directly impacts operational costs.

Theoretically, this paper contributes a nuanced understanding of LLM capabilities and the practical leverage of self-training paradigms to refine emergent properties of linguistic models post-facto. It opens pathways for subsequent research to explore other latent abilities within LLMs that can be similarly elicited with minimal retraining costs.

Future research could expand on these methods to encompass more complex reasoning scenarios or explore intermediate forms of reasoning beyond simple task-specific tuning, potentially leading to universally applicable reductions across diverse task types.

In conclusion, the research conducted by Munkhbat et al. establishes a compelling case for revisiting the default behaviors of CoT-enhanced LLMs and the robust utility that concise reasoning offers, setting a new paradigm for enhancing LLM efficiency through strategic self-training methodologies.

Related Papers

GitHub

GitHub - TergelMunkhbat/concise-reasoning: Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models (3 stars)

Tweets

https://twitter.com/iScienceLuvr/status/1895333242041180256

https://twitter.com/communicating/status/1911519686371741865

https://twitter.com/itsnamgyu/status/1911515991517143459

YouTube

Show All Videos