Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Divide-or-Conquer? Which Part Should You Distill Your LLM? (2402.15000v3)

Published 22 Feb 2024 in cs.CL and cs.LG

Abstract: Recent methods have demonstrated that LLMs can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.

Distilling the Essence: A Comparative Study on Decomposition and Solving in LLMs

Overview

Recent advancements in LLMs have underscored the significance of decomposition and solving capabilities in enhancing reasoning tasks. This paper presents a comprehensive paper focused on the distillation of these two cardinal abilities, revealing the differential ease and impact of distilling decomposition versus solving capabilities. The findings suggest that distilling the decomposition phase of reasoning tasks retains performance more effectively than distilling the problem-solving phase, indicating a promising direction for mitigating inference costs without compromising the generality or efficacy of LLMs.

Decoupling Decomposition and Solving

The process of reasoning involving LLMs has traditionally been treated as an inseparable entity, where the model generates a reasoning chain for a given problem in a single step. This approach, despite its efficiency for simpler tasks, falls short when dealing with complex reasoning tasks. This paper breaks down the reasoning process into two distinct stages: decomposition and solving. In the decomposition stage, a complex problem is dissected into manageable subproblems. In the solving stage, these subproblems are addressed individually to construct a final solution. This bifurcation has shown improved performance over the conventional single-stage model, underscoring the significance of a targeted approach.

Distilling the Decomposition Capability

The paper's experiments reveal that distilling the decomposition phase is more feasible and preserves the model’s performance significantly better than distilling the solving phase. This is attributed to the nature of decomposition, which relies more on abstract understanding and less on domain-specific knowledge. The findings also indicate that the distilled decomposition models exhibit robust generalization across various tasks and data sets, highlighting their versatility.

Implications and Future Directions

The implications of these findings are multifaceted. Practically, the ability to distill decomposition capabilities efficiently means that LLMs can be made more cost-effective and adaptable without a significant loss in performance. Theoretically, it challenges the prevailing notion that problem-solving capabilities are central to an LLM's utility, suggesting instead that a model’s ability to effectively decompose complex problems plays a crucial role.

The results encourage further exploration into distillation techniques, specifically targeting decomposition skills. Future research might investigate the optimal conditions under which the decomposition can be distilled with minimal loss. Additionally, understanding the underlying reasons why problem-solving capabilities are harder to distill could lead to new methodologies to overcome these challenges.

Conclusion

This paper confirms the hypothesis that the decomposition phase of reasoning tasks is easier to distill and more generalizable than the problem-solving phase. By effectively distilling the decomposition capability of LLMs, it is possible to achieve efficient inference and robust performance across a variety of tasks and domains. This direction not only paves the way for more cost-effective implementations of LLMs but also offers insights into the fundamental attributes that contribute to a model’s reasoning abilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhuofeng Wu (10 papers)
  2. He Bai (50 papers)
  3. Aonan Zhang (32 papers)
  4. Jiatao Gu (83 papers)
  5. VG Vinod Vydiswaran (2 papers)
  6. Navdeep Jaitly (67 papers)
  7. Yizhe Zhang (127 papers)
Citations (3)