CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration (2406.13381v1)

Published 19 Jun 2024 in cs.CL

Abstract: Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning agent, to comprehend the problem scope, formulate macro-level plans and provide detailed sub-task descriptions to local execution agents, which serves as the initial rendition of a global plan. (2) A local execution agent, to operate within the multi-tier task execution structure, focusing on detailed execution and implementation of specific tasks within the global plan. Experimental results on the WebArena benchmark show that CoAct can re-arrange the process trajectory when facing failures, and achieves superior performance over baseline methods on long-horizon web tasks. Code is available at https://github.com/xmhou2002/CoAct.

PDF HTML Abstract

CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

The paper presents CoAct, a novel framework designed to enhance the efficacy of LLMs in handling complex, long-horizon tasks through hierarchical planning and multi-agent collaboration. Despite the substantial capabilities of existing LLMs in a variety of NLP tasks, their performance is often limited when confronted with intricate real-world problems. Traditional approaches like Chain-of-Thought (CoT) and ReAct have attempted to address these limitations with varying degrees of success. CoAct builds upon these by introducing a structured multi-agent system inspired by hierarchies in human societal planning and collaboration.

CoAct Framework

CoAct introduces a dual-agent system consisting of a global planning agent and a local execution agent. The global planning agent is tasked with understanding the overall scope of a problem and developing a macro-level plan. This involves creating detailed sub-task descriptions to guide subsequent execution phases carried out by the local execution agent. The local execution agent operates within this framework to implement tasks in alignment with the detailed instructions provided, ensuring compliance with the overarching strategy while handling specific, nuanced task executions. Such a division of labor allows for more comprehensive planning and execution, potentially mitigating the flaws associated with existing single-stream LLM interventions.

Empirical Evaluation

The authors conducted experiments using the WebArena benchmark, which simulates diverse web tasks to evaluate long-horizon capabilities of autonomous systems. CoAct’s performance demonstrated an impressive capacity to adapt to failures, evidenced by its ability to re-frame and rearrange task execution trajectories more effectively than existing methods like ReAct. Notably, CoAct achieved a significant increase in task success rates, with improvements visible across all tested scenarios. For instance, when equipped with a force stop intervention, which limits unnecessary interaction exchanges, CoAct extended its superior performance further in real-world task settings.

Implications and Future Directions

The implications of CoAct are manifold. Practically, its hierarchical architecture offers a robust template for future advancements in autonomous AI systems, particularly in contexts requiring prolonged engagement and adaptability to dynamic environments. Theoretically, this work reinforces the potential value of integrating human cognitive patterns into AI system design, an approach that continues to gain traction. As the complexity of real-world applications grows, multi-agent systems like CoAct may become necessary to ensure LLMs can manage diverse tasks effectively.

There are several avenues for future research outlined in the paper, such as refining the integration of search engine data to enhance planning accuracy and employing memory mechanisms to minimize redundant actions. Further experimentation in these areas could yield improvements in operational efficiency and task success rates, advancing the frontier of LLM research in autonomous agent applications.

Overall, CoAct presents a structured approach to overcoming inherent limitations in LLM execution of real-world tasks by leveraging hierarchical planning mechanisms and multi-agent collaboration models, positioning itself as a valuable contribution to the field of AI and autonomous systems.