Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoT-Valve: Length-Compressible Chain-of-Thought Tuning (2502.09601v1)

Published 13 Feb 2025 in cs.AI and cs.CL

Abstract: Chain-of-Thought significantly enhances a model's reasoning capability, but it also comes with a considerable increase in inference costs due to long chains. With the observation that the reasoning path can be easily compressed under easy tasks but struggle on hard tasks, we explore the feasibility of elastically controlling the length of reasoning paths with only one model, thereby reducing the inference overhead of reasoning models dynamically based on task difficulty. We introduce a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths. To achieve this, we propose to identify a direction in the parameter space that, when manipulated, can effectively control the length of generated CoT. Moreover, we show that this property is valuable for compressing the reasoning chain. We construct datasets with chains from long to short for the same questions and explore two enhanced strategies for CoT-Valve: (1) a precise length-compressible CoT tuning method, and (2) a progressive chain length compression approach. Our experiments show that CoT-Valve successfully enables controllability and compressibility of the chain and shows better performance than the prompt-based control. We applied this method to QwQ-32B-Preview, reducing reasoning chains on GSM8K from 741 to 225 tokens with a minor performance drop (95.07% to 94.92%) and on AIME from 6827 to 4629 tokens, with only one additional incorrect answer.

Summary

  • The paper introduces CoT-Valve, a method that dynamically controls reasoning path lengths to improve inference efficiency.
  • It significantly reduces token counts (e.g., from 741 to 225 on GSM8K) with minimal impact on accuracy.
  • The approach leverages parameter space manipulation and a novel MixChain dataset for precise, task-adaptive chain compression.

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

The paper "CoT-Valve: Length-Compressible Chain-of-Thought Tuning" introduces a novel approach to dynamically control the length of the reasoning paths generated by LLMs to optimize inference efficiency. The proposed method, termed CoT-Valve, aims to address the increased inference costs associated with Chain-of-Thought (CoT) reasoning, a prominent technique for enhancing the reasoning capabilities of LLMs, particularly in complex cognitive tasks like mathematical problem-solving and coding. This method leverages a new model tuning strategy that allows for elastic control over the length of reasoning chains based on task difficulty, thereby reducing unnecessary computational overhead without significantly compromising model performance.

The principal contribution of the work is a tuning and inference strategy that identifies a specific direction in the parameter space. By manipulating this direction, the method effectively manages the length of generated CoTs. The paper rigorously evaluates this approach in both pre-trained and post-trained reasoning models, demonstrating that CoT-Valve enables models to generate compact reasoning paths while maintaining nearly unchanged accuracy levels.

The empirical evaluation is noteworthy, showing the method's efficiency across various reasoning models, such as QwQ-32B-Preview and DeepSeek-R1-Distill-Llama-8B. For instance, when applied to QwQ-32B-Preview on the GSM8K dataset, CoT-Valve reduced the reasoning chain length from an average of 741 to 225 tokens, with the accuracy only slightly decreasing from 95.07% to 94.92%. Similarly, that chain length reduction was achieved on the AIME dataset without substantively affecting performance, where the reasoning length was reduced from 6827 to 4629 tokens, with only one additional incorrect answer.

Moreover, the paper introduces the MixChain dataset, containing reasoning paths of varied lengths for individual questions. This dataset is utilized in conjunction with improved training strategies to refine the tuning process, thus optimizing for precise compressibility and control over the reasoning path length. Methods like progressive chain length compression and precise CoT tuning are explored, highlighting the adaptability and robustness of the CoT-Valve framework.

Theoretical implications of this work are profound. It suggests that the lengths of reasoning chains can be governed not just by altering task representations like prompts but through direct manipulations in the parameter space of models. Practically, this holds potential to substantially reduce computational expenses, making reasoning models more efficient in environments where computational resources are limited or costly. The fact that shorter reasoning paths could sometimes outperform longer ones accentuates the necessity for intelligent compression and resource allocation without compromising capability.

Speculatively, as AI models continue to scale up, methods like CoT-Valve could become essential tools in ensuring that the computational demands of these models do not outpace available technological resources. Given the paper's focus on mathematical reasoning datasets, future studies could explore the application of CoT-Valve in other domains requiring high computational reasoning loads, such as complex decision-making and real-time analytics. This exploration could further expand the practical utility and generalizability of the CoT-Valve approach in diverse AI applications.

In summary, the CoT-Valve method provides a significant advance in the field of LLMs by introducing a practical and efficient manner to manage reasoning path lengths dynamically, promising broad implications for enhancing both the operational efficiency and applicability of AI-driven reasoning models.

Reddit Logo Streamline Icon: https://streamlinehq.com