Papers
Topics
Authors
Recent
Search
2000 character limit reached

Executable Code Actions Elicit Better LLM Agents

Published 1 Feb 2024 in cs.CL and cs.AI | (2402.01030v4)

Abstract: LLM agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.

Citations (72)

Summary

  • The paper introduces CodeAct, integrating executable Python code into LLM agents to expand action spaces and enable dynamic multi-turn interactions.
  • It demonstrates a 20% success rate improvement and a 30% reduction in action steps compared to traditional JSON or text-based methods.
  • The approach paves the way for more autonomous agents with enhanced task-solving and self-debugging through integrated Python libraries.

Executable Code Actions Elicit Better LLM Agents

In the paper "Executable Code Actions Elicit Better LLM Agents," the authors investigate the use of executable Python code to enhance the capabilities of LLM agents. The traditional approach for LLMs has been to generate actions using JSON or text within pre-defined formats, which often restricts the scope and flexibility of the actions due to constrained action spaces. The proposed solution, CodeAct, seeks to address these limitations by consolidating LLM agents’ actions into a unified action space using executable Python code.

CodeAct Framework and Implementation

CodeAct integrates Python code directly into LLM agents, allowing them to execute code actions and dynamically adapt their actions based on new observations through multi-turn interactions. The framework leverages a Python interpreter to facilitate code execution, enabling dynamic revisions and new action emissions as the agent interacts with its environment. This approach significantly broadens the action space, allowing LLMs to utilize a diverse range of Python packages and automated feedback mechanisms, such as error messages for self-debugging, to optimize task-solving. Figure 1

Figure 1: General agent multi-turn interaction framework highlighting the role of CodeAct in agent-environment and agent-user interaction.

Comparative Performance and Evaluation

An extensive evaluation of 17 LLMs on both existing benchmarks like API-Bank and a newly curated benchmark, M3^3ToolEval, reveals that CodeAct significantly outperforms traditional text or JSON actions. Notably, CodeAct achieves up to 20% higher success rates in complex task scenarios requiring multiple tool invocations over multi-turn interactions. Figure 2

Figure 2

Figure 2: Comparison between CodeAct and Text / JSON as action, showing CodeAct's superior performance on M3^3ToolEval.

The evaluation criteria focused on the capability of LLMs to utilize control and data flow features inherent to Python, allowing more sophisticated logical operations such as loops and conditional statements within a single code block. This approach proved more efficient, reducing the number of necessary actions by up to 30% compared to JSON or text.

CodeActAgent Development

Motivated by CodeAct's success, the authors developed an open-source LLM agent, CodeActAgent, which integrates seamlessly with Python interpreters. This agent is capable of handling complex tasks such as model training and data visualization autonomously by leveraging existing libraries. The CodeActAgent was finetuned using CodeActInstruct, a dataset consisting of 7,000 multi-turn interactions focused on improving the agent's task-solving and self-debugging capabilities. Figure 3

Figure 3: Example multi-turn interaction with Python packages using CodeActAgent.

Implications and Future Directions

The findings from the study suggest significant practical implications for the design of LLM agents. By directly integrating executable code actions, developers can construct more flexible and capable agents that require less human intervention for fine-tuning and can autonomously adapt to complex, dynamic environments. This approach opens pathways for more generalized task-solving abilities across diverse applications, potentially reducing the overhead required to tailor LLMs for specific use-cases.

Future research may explore further optimizations and extensions of the CodeAct framework to other programming environments, expanding its applicability to different domains and further enhancing the adaptability of LLM agents. Additionally, work could be aimed at improving the open-source LLM models to reduce the performance gap observed with proprietary models.

Conclusion

"Executable Code Actions Elicit Better LLM Agents" introduces a compelling approach to developing more versatile and powerful LLM agents. By harnessing the capabilities of executable Python code, CodeAct addresses existing limitations in LLM agent design, offering a scalable solution to enhance real-world task-solving through multi-turn interactions and dynamic action adjustments. The findings underscore the potential of this approach to significantly impact AI applications across various fields, setting the stage for future advancements in autonomous agent development.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 66 tweets with 4987 likes about this paper.