Small LLMs Are Weak Tool Learners: A Multi-LLM Agent (2401.07324v3)
Abstract: LLM agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete various tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers accurately but also excel in task planning, tool invocation, and result summarization. While traditional works focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. To overcome these challenges, we propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.
- Fireact: Toward language agent fine-tuning. arXiv preprint arXiv:2310.05915.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- How abilities in large language models are affected by supervised fine-tuning data composition. arXiv preprint arXiv:2310.05492.
- Tora: A tool-integrated reasoning agent for mathematical problem solving. arXiv preprint arXiv:2309.17452.
- Significant Gravitas. 2023. Autogpt: the heart of the open-source agent ecosystem.
- Measuring mathematical problem solving with the math dataset. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352.
- Modelscope-agent: Building your customizable agent system with open-source large language models.
- Yohei Nakajima. 2023. Babyagi.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
- OpenAI. 2022. Chatgpt: Conversational ai language model. Website. https://openai.com/chatgpt.
- OpenAI. 2023a. Gpt-4 code interpreter.
- OpenAI. 2023b. Gpt-4 technical report.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22.
- Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
- Communicative agents for software development. arXiv preprint arXiv:2307.07924.
- Tool learning with foundation models. arXiv preprint arXiv:2304.08354.
- Toolllm: Facilitating large language models to master 16000+ real-world apis.
- Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning. In SC21: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE Computer Society.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. arXiv preprint arXiv:2303.17580.
- Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
- Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
- Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Multi-party chat: Conversational agents in group settings with humans and models. arXiv preprint arXiv:2304.13835.
- Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671.
- Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155.
- Chatgpt is not enough: Enhancing large language models with knowledge graphs for fact-aware language modeling. arXiv preprint arXiv:2306.11489.
- Gpt4tools: Teaching large language model to use tools via self-instruction. arXiv preprint arXiv:2305.18752.
- Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations.
- Agenttuning: Enabling generalized agent abilities for llms. arXiv preprint arXiv:2310.12823.
- Memorybank: Enhancing large language models with long-term memory. arXiv preprint arXiv:2305.10250.
- Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144.
- Weizhou Shen (18 papers)
- Chenliang Li (92 papers)
- Hongzhan Chen (6 papers)
- Ming Yan (190 papers)
- Xiaojun Quan (52 papers)
- Hehong Chen (10 papers)
- Ji Zhang (176 papers)
- Fei Huang (409 papers)