Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (2412.15701v2)

Published 20 Dec 2024 in cs.AI, cs.CL, and cs.HC

Abstract: Recent advancements in LMs have sparked growing interest in developing LM agents. While fully autonomous agents could excel in many scenarios, numerous use cases inherently require them to collaborate with humans due to humans' latent preferences, domain expertise, or need for control. To facilitate the study of human-agent collaboration, we present Collaborative Gym (Co-Gym), a general framework enabling asynchronous, tripartite interaction among agents, humans, and task environments. We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions, and propose an evaluation framework that assesses both the collaboration outcomes and processes. Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance within those delivered cases, achieving win rates of 86% in Travel Planning, 74% in Tabular Analysis, and 66% in Related Work when evaluated by real users. However, our study also highlights significant challenges in developing collaborative agents, requiring advancements in core aspects of intelligence -- communication capabilities, situational awareness, and balancing autonomy and human control.

Summary

The paper presents a novel framework that structures human-agent collaboration as a Partially Observable Markov Decision Process for asynchronous interactions.
It details a comprehensive evaluation suite with metrics like Delivery Rate, Task Performance, and Initiative Entropy to assess collaborative dynamics.
Empirical results show that while collaborative agents improve task performance over autonomous ones, challenges in coordination and communication remain.

Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

The paper presents Collaborative Gym (Co-Gym), a framework designed to facilitate the paper and implementation of human-agent collaboration. As research in autonomous LLM (LM) agents progresses, there is a pressing need to investigate scenarios that necessitate human involvement due to latent preferences, domain expertise, or control needs. Co-Gym addresses this by providing a structure for asynchronous interaction among humans, agents, and task environments, thus permitting a nuanced exploration of collaborative dynamics.

In Co-Gym, human-agent collaboration is structured as a Partially Observable Markov Decision Process (POMDP), accommodated by an environment interface that maintains both public and private component distinctions in the observation space. This allows agents and humans to interact within a shared workspace without unnecessary constraints on agent implementations. Co-Gym further distinguishes itself by enabling asynchronous interactions, which more accurately reflect natural human collaboration by allowing both parties to act without enforcing strict turn-order paradigms.

A significant contribution of Co-Gym lies in its comprehensive evaluation framework, which considers both outcomes and processes of collaboration. It defines metrics such as Delivery Rate, Task Performance, and Collaboration Score for outcomes, while using metrics like Initiative Entropy and Controlled Autonomy to audit processes. These metrics provide a detailed understanding of human-agent dynamics, highlighting the importance of communication and situational awareness in achieving effective collaboration.

Implementation of Co-Gym is demonstrated through three task environments—Travel Planning, Related Work Writing, and Tabular Analysis—each designed to test different collaborative requirements. In simulated conditions, the framework leverages pre-collected data and simulated human behavior to paper human-agent interactions in a controlled manner. When extended to real-world settings, Co-Gym supports dynamic interactions via a web interface where real human users interact with agents.

The empirical results underscore both achievements and challenges. Collaborative agents consistently outperform fully autonomous agents in terms of task performance; however, they display lower delivery rates, indicating challenges in coordination and planning when incorporating human inputs. Furthermore, while collaborative dynamics are beneficial, the paper identifies persistent challenges in communication, situational awareness, and planning—a testament to the complexities behind creating truly collaborative intelligent systems.

Overall, Co-Gym paves the way for future advancements in human-agent collaboration by providing a scalable, reusable, and insightful framework for research and development. It highlights the potential for collaborative agents to supplement human decision-making and work processes, though it also stresses the need for continued progress in overcoming the critical barriers identified in the paper.

In conclusion, while Co-Gym makes considerable progress toward understanding human-agent collaboration, it also emphasizes directions for future work, including the exploration of additional tasks, user studies in diverse contexts, and iterative development based on insights from simulated conditions. By aligning collaborative agent research with these needs and challenges, Co-Gym sets a clear agenda for advancing Intelligent Collaborative Systems.