Divide-or-Conquer? Which Part Should You Distill Your LLM? (2402.15000v3)

Published 22 Feb 2024 in cs.CL and cs.LG

Abstract: Recent methods have demonstrated that LLMs can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.

PDF HTML Abstract

Distilling the Essence: A Comparative Study on Decomposition and Solving in LLMs

Overview

Recent advancements in LLMs have underscored the significance of decomposition and solving capabilities in enhancing reasoning tasks. This paper presents a comprehensive paper focused on the distillation of these two cardinal abilities, revealing the differential ease and impact of distilling decomposition versus solving capabilities. The findings suggest that distilling the decomposition phase of reasoning tasks retains performance more effectively than distilling the problem-solving phase, indicating a promising direction for mitigating inference costs without compromising the generality or efficacy of LLMs.

Decoupling Decomposition and Solving

The process of reasoning involving LLMs has traditionally been treated as an inseparable entity, where the model generates a reasoning chain for a given problem in a single step. This approach, despite its efficiency for simpler tasks, falls short when dealing with complex reasoning tasks. This paper breaks down the reasoning process into two distinct stages: decomposition and solving. In the decomposition stage, a complex problem is dissected into manageable subproblems. In the solving stage, these subproblems are addressed individually to construct a final solution. This bifurcation has shown improved performance over the conventional single-stage model, underscoring the significance of a targeted approach.

Distilling the Decomposition Capability

The paper's experiments reveal that distilling the decomposition phase is more feasible and preserves the model’s performance significantly better than distilling the solving phase. This is attributed to the nature of decomposition, which relies more on abstract understanding and less on domain-specific knowledge. The findings also indicate that the distilled decomposition models exhibit robust generalization across various tasks and data sets, highlighting their versatility.

Implications and Future Directions

The implications of these findings are multifaceted. Practically, the ability to distill decomposition capabilities efficiently means that LLMs can be made more cost-effective and adaptable without a significant loss in performance. Theoretically, it challenges the prevailing notion that problem-solving capabilities are central to an LLM's utility, suggesting instead that a model’s ability to effectively decompose complex problems plays a crucial role.

The results encourage further exploration into distillation techniques, specifically targeting decomposition skills. Future research might investigate the optimal conditions under which the decomposition can be distilled with minimal loss. Additionally, understanding the underlying reasons why problem-solving capabilities are harder to distill could lead to new methodologies to overcome these challenges.

Conclusion

This paper confirms the hypothesis that the decomposition phase of reasoning tasks is easier to distill and more generalizable than the problem-solving phase. By effectively distilling the decomposition capability of LLMs, it is possible to achieve efficient inference and robust performance across a variety of tasks and domains. This direction not only paves the way for more cost-effective implementations of LLMs but also offers insights into the fundamental attributes that contribute to a model’s reasoning abilities.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Zhuofeng Wu (10 papers)
He Bai (50 papers)
Aonan Zhang (32 papers)
Jiatao Gu (83 papers)
VG Vinod Vydiswaran (2 papers)
Navdeep Jaitly (67 papers)
Yizhe Zhang (127 papers)

Citations (3)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/IntuitMachine/status/1765706821581283709

https://twitter.com/YizheZhangNLP/status/1762218513740202136

https://twitter.com/_akhaliq/status/1761952591339004345

https://twitter.com/cserxy/status/1762220204258640326

https://twitter.com/TheTuringPost/status/1762153894657020033

https://twitter.com/seclink/status/1762278559643578524