Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning (2501.12570v2)

Published 22 Jan 2025 in cs.CL

Abstract: Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge. Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner

Summary

  • The paper introduces a method that leverages pre-sampling evaluation and RL-based fine-tuning to cut redundant reasoning paths in LLMs.
  • It achieves up to 40% reduction in sequence length while improving accuracy on mathematical reasoning benchmarks.
  • The study balances inference speed with task precision, extending LLM optimization paradigms and informing future adaptive fine-tuning research.

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

The research paper introduces an advanced fine-tuning methodology, Length-Harmonizing Fine-Tuning (O1-Pruner), aimed at optimizing the inference processes of LLMs specifically designed for mathematical reasoning, such as OpenAI's O1. The focus of the paper is to address the inference inefficiencies of long-thought reasoning models, which often involve excessive sequence generation and subsequently higher computational costs without necessarily enhancing accuracy.

Key Insights and Contributions

  1. Inference Redundancy in LLMs: The paper identifies a critical inefficiency in long-thought reasoning LLMs—known as length disharmony—where the length of generated sequences does not appropriately align with the complexity of tasks. This results in redundant processing and increased computation time without proportional improvements in reasoning accuracy.
  2. Length-Harmonizing Fine-Tuning (O1-Pruner): The authors propose O1-Pruner as a solution to minimize inference overhead while maintaining or improving task accuracy. This is achieved through a two-phase approach:
    • Pre-Sampling Evaluation: Initially, the LLM's performance is evaluated with pre-sampled instances to establish a baseline, allowing for identification of redundancy in reasoning paths.
    • Reinforcement Learning (RL)-Based Fine-Tuning: Leveraging these insights, the model undergoes a fine-tuning process driven by a specially designed reward function. This function encourages shorter, less redundant reasoning paths without compromising correctness.
  3. Empirical Validation: Extensive experiments conducted on various mathematical reasoning benchmarks reveal that O1-Pruner not only reduces sequence lengths but also enhances accuracy compared to other methods. Notably, Marco-o1-7B and QwQ-32B models show significant reduction in output length by approximately 40% and 35%, respectively, while achieving higher accuracy scores.
  4. Theoretical Implications: The framework presents a nuanced perspective of LLM alignment, extending the paradigm beyond traditional SFT and RLHF approaches by prioritizing faster inference alongside accuracy improvements.
  5. Comprehensive Evaluation: The paper conducts further analyses to understand the influence of hyperparameters and dataset difficulty. The findings indicate that datasets with harder samples lead to models learning correct reasoning paths, thus facilitating enhanced accuracy despite length constraints.

Practical and Theoretical Implications

The O1-Pruner paradigm exemplifies an innovative pathway in optimizing LLMs for computational tasks that require extensive reasoning capabilities. Practically, this approach demonstrates significant potential in applications where inference speed is critical, such as real-time decision-making systems and automated mathematical tutoring platforms.

Theoretically, the paper posits an expanded view of LLM optimization, focusing on balancing the trade-offs between efficiency and accuracy. This framework can potentially inform future studies on adaptive mechanisms within LLM architectures that dynamically adjust computational resources based on task complexity.

Future Directions

Future research can explore the extension of O1-Pruner to multimodal reasoning tasks, considering its current focus is primarily on text-based mathematical reasoning. Furthermore, integrating O1-Pruner with other efficiency-oriented ML optimizers could yield additional enhancements, offering a holistic approach to tackling the computational challenges faced by LLMs in varied domains.

In sum, the paper provides a valuable contribution to the ongoing discourse on maximizing the efficacy of LLMs for reasoning-intensive applications, ensuring that their expansive capabilities are matched by practical and computational efficiencies.