DynaSaur: Large Language Agents Beyond Predefined Actions (2411.01747v1)

Published 4 Nov 2024 in cs.CL

Abstract: Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly-scoped environments, we argue that it presents two major challenges when deploying LLM agents in real-world scenarios: (1) selecting from a fixed set of actions significantly restricts the planning and acting capabilities of LLM agents, and (2) this approach requires substantial human effort to enumerate and implement all possible actions, which becomes impractical in complex environments with a vast number of potential actions. In this work, we propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. In this framework, the agent interacts with the environment by generating and executing programs written in a general-purpose programming language at each step. Furthermore, generated actions are accumulated over time for future reuse. Our extensive experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods. Notably, it allows an LLM agent to recover in scenarios where no relevant action exists in the predefined set or when existing actions fail due to unforeseen edge cases. At the time of writing, we hold the top position on the GAIA public leaderboard. Our code can be found in \href{https://github.com/adobe-research/dynasaur}{https://github.com/adobe-research/dynasaur}.

PDF HTML Abstract

Overview of "DynaSaur: Large Language Agents Beyond Predefined Actions"

The paper "DynaSaur: Large Language Agents Beyond Predefined Actions" tackles a prevalent limitation in existing LLM agent systems, specifically the restriction imposed by relying on a fixed set of predefined actions. The authors propose a novel framework that allows LLM agents to dynamically create and accumulate actions, enhancing their flexibility and capability in handling complex and real-world tasks.

Motivation and Problem Addressed

The current paradigm in deploying LLM agents involves selecting actions from a static set, constraining adaptability, especially in dynamic environments with numerous potential actions. The primary challenges identified are: (1) the significant restriction on planning and executing actions due to the limited set, and (2) the impracticality of humanly enumerating and implementing all possible actions in complex environments. The authors present an alternative by enabling LLM agents to define and execute actions using programs written in general-purpose programming languages. This fundamentally shifts the reliance away from predefined actions to an adaptable, on-demand creation of actions.

Methodology

The DynaSaur framework models each action as a Python function, capitalizing on Python's expressiveness and compatibility with a wide array of libraries and tools. At each decision-making step, the agent generates Python code snippets either to define new actions or reuse existing ones from its growing library of functions. Importantly, generated actions are accumulated over time, building an annotated library of functions stored for future reference and composition. In terms of interaction, the agent leverages an existing ecosystem of Python packages, which allows it to engage with diverse systems and tools effectively.

The implementation of such a framework is structured around a Partially Observable Markov Decision Process, enabling the agent's action space to evolve dynamically based on the tasks it encounters. The representation of actions in Python fulfills the dual requirements of generality and composability, deemed essential for robust LLM agent architectures.

Experimental Setup and Results

The paper reports extensive experimentation on the GAIA benchmark—a comprehensive suite designed to evaluate generality and adaptability in intelligent agents. Notably, the proposed framework not only improves the versatility of LLM agents but also achieves superior performance as demonstrated by holding the top position on the GAIA public leaderboard at the time of evaluation.

The empirical findings exhibit significant performance enhancement over baseline methods in handling diverse GAIA tasks without predefined supporting functions. The incorporation of human-developed tools into the LLM-generated functions library showcases the potential of DynaSaur to synergize with existing methods, further amplifying its efficacy.

Implications and Future Directions

The introduction of DynaSaur marks a substantial evolution in the development of LLM agents, primarily by unlocking unprecedented flexibility in action selection and planning processes. Practically, this could translate to more capable AI systems in domains that require intricate interactions and decision-making pathways, such as autonomous robotics, complex problem-solving in digital assistants, and adaptive learning systems.

Theoretically, the work contributes to the growing body of research on autonomous agent systems augmented by LLM capabilities. It raises interesting questions on how dynamically generated actions can be refined and shared across different tasks and environments, hinting at the emergence of a new form of LLM agent adaptability.

Future work could explore mechanisms to optimize the action library growth, ensuring that the accumulation process remains efficient and operations using the library remain computationally feasible. Further research might also delve into curriculum strategies for presenting tasks that facilitate the systematic and meaningful expansion of reusable actions.

In summary, the DynaSaur framework provides a significant step toward more adaptable and robust LLM agent systems, offering a promising outlook on their deployment in a vast array of real-world scenarios.