Collaborating with Humans without Human Data (2110.08176v2)

Published 15 Oct 2021 in cs.LG, cs.HC, and cs.MA

Abstract: Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train "human-aware" agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. We train our agent partner as the best response to a population of self-play agents and their past checkpoints taken throughout training, a method we call Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. Furthermore, humans also report a strong subjective preference to partnering with FCP agents over all baselines.

PDF Abstract

Collaborating with Humans without Human Data: An Exploration of Multi-Agent Reinforcement Learning Strategies

Collaborating with human partners in AI endeavors represents a significant challenge, especially in terms of developing agents that can adapt to the unique preferences, strengths, and weaknesses of individual human collaborators. The paper "Collaborating with Humans without Human Data" by Strouse et al. tackles this issue by proposing Fictitious Co-Play (FCP), a novel approach within multi-agent reinforcement learning to train agents for effective collaboration in common-payoff games without reliance on human data.

Problem Statement and Context

Standard strategies in multi-agent reinforcement learning, notably self-play (SP) and population play (PP), often fall short in generalizing to human partners due to their tendency to overfit to a specific training set of agents. Alternative methods, such as behavioral cloning play (BCP), require extensive human data collection to train humanoid agents capable of emulating human behavior, which is resource-intensive and raises privacy concerns. The authors address the fundamental challenge of creating diverse training partners for artificial agents to better generalize to novel human partners in collaboration scenarios.

Fictitious Co-Play (FCP) Methodology

The paper introduces Fictitious Co-Play (FCP), drawing from successful strategies in competitive domains that utilize diversity. FCP leverages a simplistic yet effective two-stage diversity-driven framework:

Training Partners: Diverse agents are established by training $N$ self-play agents, varying only their initialization conditions. Checkpoints are stored periodically, capturing varying skills and strategies throughout their training. This ensures exposure to different conventions and skill levels without additional computational cost.
Training Adaptation: An agent is trained to respond as optimally as possible to this population of varied agents and their checkpoints. By fixing the parameters of training partners, FCP forces the adaptive learning of collaboration strategies.

Experimental Validation

Using the collaborative cooking simulator, Overcooked, the authors validate their approach through rigorous experimentation:

Agent-Agent Collaboration: FCP agents consistently outperformed SP, PP, and BCP when partnered with a diverse set of novel agents. Notably, FCP exceeded BCP performance with proxy human agents, indicating improved generalization abilities.
Human-Agent Collaboration: In a comprehensive human-agent paper involving 114 participants, FCP showed a clear advantage both in objective scoring metrics and subjective human preference. Humans preferred collaborating with FCP-trained agents over other models, affirming the versatility and adaptability of FCP agents.

Contributions and Implications

The authors outline several critical contributions:

FCP provides an innovative mechanism to train agents without human data while achieving superior generalization abilities, both in agent-agent and human-agent collaborations.
The inclusion of past checkpoints in agent training emerges as a pivotal factor for successful zero-shot coordination, showcasing the importance of strategy diversity.
The robust experimental design combining quantitative and qualitative analyses serves as a robust framework for future human-agent interaction studies.

From a theoretical perspective, FCP presents a compelling case for embracing diversity in agent training, potentially unlocking new pathways in AI research focused on human compatibility. Practical implications include efficient training regimes that circumvent the costly data collection processes associated with human-aware models.

Future Directions

While FCP evidences significant promise, its implementation introduces challenges regarding the manual configuration and innate bias potential in establishing training diversity. Future work could focus on automating these processes, potentially exploring adaptive population matchmaking and auxiliary diversity objectives.

Additionally, while FCP offers a streamlined collaboration model lacking human data, broader implications for real-world applications necessitate rigorous exploration across diverse domains.

Conclusion

Fictitious Co-Play decidedly advances the state of AI collaboration, particularly in crafting agents capable of genuine, effective partnerships with humans in zero-shot settings. Building on shared goals and cooperative interactions, FCP lays the groundwork for AI systems that seamlessly integrate into human environments while precluding the necessity for extensive human data, potentially transforming cooperative AI into a staple of technological innovation.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

DJ Strouse (15 papers)
Kevin R. McKee (28 papers)
Matt Botvinick (15 papers)
Edward Hughes (40 papers)
Richard Everett (15 papers)

Citations (138)

View on Semantic Scholar

Related Papers

Find Related Papers