On the Utility of Learning about Humans for Human-AI Coordination (1910.05789v2)

Published 13 Oct 2019 in cs.LG, cs.AI, cs.HC, and stat.ML

Abstract: While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the human's gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them. Code is available at https://github.com/HumanCompatibleAI/overcooked_ai.

PDF Abstract

Overview of Human-AI Coordination through Human Modeling

The paper under review, "On the Utility of Learning about Humans for Human-AI Coordination," investigates the challenges and potential strategies for improving coordination between AI agents and humans. The research underscores a critical limitation in existing coordination frameworks, particularly those based on self-play and population-based training: their ineffectiveness at human-AI collaboration despite excelling in environments involving AI-AI interaction.

Key Insights and Methodology

The paper employs a controlled environment, modeled on the game Overcooked, to explore coordination dynamics. Here, artificial agents interact with both simulated human models and real human participants to assess performance across different training paradigms. The authors introduce a simple behavior cloning model to mimic human play and benchmark AI performance against this model.

Methodological Highlights:

Training Paradigms: The paper differentiates between agents trained via self-play and those trained using a behavior cloning model of human play. Self-play-based methods include Proximal Policy Optimization (PPO) and Population Based Training (PBT), which are designed for robustness when interacting with other AI agents. By contrast, PPO integrated with the behavior cloning human model (PPO $_{BC}$ ) aims to enhance performance in human-AI settings.
Evaluation Metrics: Coordination effectiveness is measured through cumulative rewards, reflecting both low and high-level strategic coordination in the game environment.

Empirical Findings

The empirical results reveal several important trends:

Self-Play Limitations: Self-play and PBT agents demonstrate strong performance in AI-AI interactions but significant deterioration when paired with human models or real individuals. This decline is attributed to their reliance on optimizing agents like themselves, leading to coordination protocols that are opaque and misaligned with human expectations.
Benefits of Human Modeling: Incorporating a behavior cloning human model into training (PPO $_{BC}$ ) significantly improves coordination with humans. This approach allows agents to anticipate human actions more effectively, outperforming self-play agents in human-agent pairings.
Adaptability and Role Versatility: Qualitatively, the PPO $_{BC}$ agents exhibit greater adaptability and capacity to assume both leader and follower roles, aligning better with the dynamic nature of human decision-making.

Implications and Future Directions

The paper provides compelling evidence that neglecting human behavior modeling constrains AI’s utility in collaborative scenarios. This suggests a pressing need for refined training methodologies that incorporate human data to enable effective human-AI partnerships.

Potential Research Directions:

Enhanced Human Modeling: Future work could explore more sophisticated models that capture human learning and adaptivity over time, utilizing approaches like theory of mind or using recurrent architectures to model learning dynamics.
Diverse Behavioral Paradigms: Developing environments that require agents to interact with a broader spectrum of human-like behaviors could further illuminate the necessity and efficiency of integrating human-centric models.
Adaptive Agents: Training methodologies that facilitate online adaptation to changing human strategies present another avenue for improving real-time coordination and learning in human-AI interactions.

Conclusion

This research underscores the criticality of incorporating human behavioral models into AI training regimens to improve human-AI coordination. The demonstrated improvement when human models are utilized points towards more nuanced, human-aware systems that leverage behavior-centric data. Bridging the current gap between AI-AI interactions and human-AI collaborations holds the promise of advancing AI systems that are truly augmentative and symbiotic with human capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Micah Carroll (16 papers)
Rohin Shah (31 papers)
Mark K. Ho (21 papers)
Thomas L. Griffiths (150 papers)
Sanjit A. Seshia (105 papers)
Pieter Abbeel (372 papers)
Anca Dragan (62 papers)

Citations (327)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - HumanCompatibleAI/overcooked_ai: A benchmark environment for fully cooperative human-AI performance. (832 stars)