Large Language Models as Tool Makers (2305.17126v2)

Published 26 May 2023 in cs.LG, cs.AI, cs.CL, and stat.ML

Abstract: Recent research has highlighted the potential of LLMs to improve their problem-solving capabilities with the aid of suitable external tools. In our work, we further advance this concept by introducing a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving. Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks. 2) tool using: another LLM acts as the tool user, which applies the tool built by the tool maker for problem-solving. On the problem-solving server side, tool-making enables continual tool generation and caching as new requests emerge. This framework enables subsequent requests to access cached tools via their corresponding APIs, enhancing the efficiency of task resolution. Recognizing that tool-making requires more sophisticated capabilities, we assign this task to a powerful, albeit resource-intensive, model. Conversely, the simpler tool-using phase is delegated to a lightweight model. This strategic division of labor allows the once-off cost of tool-making to be spread over multiple instances of tool-using, significantly reducing average costs while maintaining strong performance. Furthermore, our method offers a functional cache through the caching and reuse of tools, which stores the functionality of a class of requests instead of the natural language responses from LLMs, thus extending the applicability of the conventional cache mechanism. We evaluate our approach across various complex reasoning tasks, including Big-Bench tasks. With GPT-4 as the tool maker and GPT-3.5 as the tool user, LATM demonstrates performance equivalent to using GPT-4 for both roles, but with a significantly reduced inference cost.

PDF Abstract

LLMs as Tool Makers: An Analytical Overview

The paper "LLMs as Tool Makers" presents a novel closed-loop framework, termed LATM, which empowers LLMs to autonomously create their own reusable tools, thus enhancing their problem-solving capabilities. It introduces a two-phase design comprising the tool-making phase and the tool-using phase. The overarching goal is to improve the efficiency and cost-effectiveness of LLMs when addressing tasks. This framework shows significant potential in optimizing the computational expenses associated with serving LLM requests, by leveraging powerful, resource-intensive models only during the tool-making process and relying on more lightweight models in the tool-using phase. The strategic allocation of labor between these two types of models allows for a redistribution of resource utilization, ultimately aiming to achieve performance levels on par with more robust models but at a reduced computational cost.

Framework Composition and Methodology

At the core of this new approach is the application of LLMs, like GPT-4, in crafting new tools, represented as Python utility functions, based on a set of tasks. This "tool-making" LLM generates reusable solutions that can be employed by another LLM, or even the same one, acting in the role of a "tool user." The process establishes a crucial differentiation: the tool-making process is demanding in computational capacity but occurs infrequently, while the tool-using process can be offloaded to less powerful models capable of conducting repeated inferences efficiently.

The tool-making phase involves several substages: proposing a tool, verifying its correctness through unit tests, and finally wrapping the tool for future reuse. Once complete, the tool can be stored and reapplied to similar tasks, thereby amortizing its initial computational cost over multiple uses. This delegation strategy introduces a novel functional cache system, which allows caching not just textual responses but entire functional capabilities, further optimizing the system's cost efficiency.

Experimental Results and Discussion

The assessment of LATM is conducted over a range of complex reasoning tasks, including the Big-Bench tasks. These evaluations utilize GPT-4 as the tool maker and GPT-3.5 Turbo as the tool user. The results are compelling: the LATM framework allows the lightweight GPT-3.5 Turbo to match the performance of GPT-4 when the latter is used for both roles. Specifically notable is the functionality cache aspect, which expands the traditional cache mechanism by allowing the system to store and retrieve the complexity of certain task functionalities rather than merely caching linguistic outputs.

Implications and Speculative Future Directions

The implications of LATM are profound, offering insights into a shift towards more decentralized and resource-efficient problem-solving frameworks in AI. The system proposes a significant leap in the autonomy of LLMs, drawing a parallel to how humans evolve by creating tools to overcome challenges. This self-sustaining tool generation could become pivotal in scaling AI applications, optimizing operation costs, and potentially expanding the application domain of LLMs to more diverse and complex environments.

Future developments might focus on enhancing the reliability and adaptability of the tool-making process, including the ability to update and refine tools as new tasks emerge. Incorporating mechanisms that allow LLMs to self-diagnose and adapt tools autonomously could significantly advance the field, paving the way for more sophisticated and generalized AI systems.

In conclusion, LATM represents a meaningful step forward in AI research, illustrating how LLMs can transition from mere tool users to tool creators, thus broadening their functional capacity and efficiency. This approach not only reduces operational costs but also opens new avenues for the application of LLMs, aligning with the ongoing evolution of artificial general intelligence.