Small LLMs Are Weak Tool Learners: A Multi-LLM Agent (2401.07324v3)

Published 14 Jan 2024 in cs.AI and cs.CL

Abstract: LLM agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete various tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers accurately but also excel in task planning, tool invocation, and result summarization. While traditional works focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. To overcome these challenges, we propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.

References (40)

Authors (8)

Weizhou Shen (18 papers)
Chenliang Li (92 papers)
Hongzhan Chen (6 papers)
Ming Yan (190 papers)
Xiaojun Quan (52 papers)
Hehong Chen (10 papers)
Ji Zhang (176 papers)
Fei Huang (409 papers)

Citations (36)

View on Semantic Scholar

Summary

The paper demonstrates that decomposing tool learning into distinct roles (planner, caller, and summarizer) significantly improves execution over single-LLM approaches.
It introduces the GLPFT training method that fine-tunes a backbone LLM into specialized sub-tasks, enhancing accuracy and reducing hallucinations.
Empirical results on ToolBench and ToolAlpaca reveal marked improvements in Action Exact Match and Argument F1, validating the framework's modular design.

An Overview of "Small LLMs Are Weak Tool Learners: A Multi-LLM Agent"

The paper "Small LLMs Are Weak Tool Learners: A Multi-LLM Agent" by Weizhou Shen et al. addresses a significant challenge in the domain of LLMs—their ability to effectively integrate and use external tools. The research highlights the limitations faced by smaller LLMs in performing task planning, tool invocation, and result summarization concurrently. As a novel solution, the authors propose decomposing these capabilities into three distinct roles: planner, caller, and summarizer, each implemented using individual LLMs.

Problem Statement

Traditional approaches often rely on training a single LLM to handle all aspects of task execution, including understanding user queries, deciding on external tool usage, and generating appropriate responses. However, smaller LLMs demonstrate clear performance restrictions when tasked with such comprehensive roles. Notably, they often falter in maintaining robust and reliable interactions with external tools, reducing their utility in real-world applications where tool usage dynamics are critical.

Proposed Framework

In response to these challenges, the paper introduces a modular multi-LLM framework, termed $\alpha$ -UMi, which fragments the tool-learning process into specialized components:

Planner: Responsible for task planning and decision-making, deciding the sequence of actions to take for task completion.
Caller: Engages with external tools by crafting accurate and efficient API requests based on the planner's decisions.
Summarizer: Generates the final response for user queries by synthesizing results from the previous steps.

This decomposition facilitates each LLM's focus on a single sub-task, potentially allowing smaller models to be utilized effectively within the framework.

Training Methodology

To train the proposed multi-LLM system, the authors introduce a two-stage training paradigm named Global-to-Local Progressive Fine-Tuning (GLPFT). Initially, a backbone LLM is trained on the entire task without discrimination among sub-tasks, fostering a broad understanding of the process. Subsequently, three derivatives of this backbone are separately fine-tuned for their designated roles using task-specific datasets.

Empirical Evaluation

The framework is evaluated on prominent tool-learning benchmarks such as ToolBench and ToolAlpaca. Results reveal that the proposed multi-LLM agent consistently surpasses the performance of single-LLM configurations, with marked improvements across several metrics including Action Exact Match, Argument F1, and planning accuracy. Notably, the modular structure demonstrates significant advantages in reducing hallucinations and improving both in-domain and out-of-domain task performance.

Implications and Future Directions

The modular approach described in the paper demonstrates considerable efficacy in leveraging the capabilities of smaller LLMs when breaking down complex tasks into manageable components. The findings could lead to advancements in AI systems where integrating real-time, evolving tool ecosystems is essential.

Future work could explore optimizing the interplay between the planner, caller, and summarizer, possibly incorporating dynamic adaptability to enhance task execution in changing environments. Additionally, further research might investigate the integration of this framework with other neural architectures or varying sizes of LLMs to further scale performance while minimizing computational overhead.

In conclusion, the paper makes significant strides in addressing the identified deficits of small LLMs in tool-learning tasks through an innovative, decomposed framework, paving the way for future explorations and applications in AI-driven task automation and human-computer interaction.

PDF Markdown

Related Papers

Tweets

https://twitter.com/xuhaiya2483846/status/1755548326802731035

YouTube

Show All Videos