Emergent Tool Use From Multi-Agent Autocurricula (1909.07528v2)

Published 17 Sep 2019 in cs.LG, cs.AI, cs.MA, and stat.ML

Abstract: Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. We find clear evidence of six emergent phases in agent strategy in our environment, each of which creates a new pressure for the opposing team to adapt; for instance, agents learn to build multi-object shelters using moveable boxes which in turn leads to agents discovering that they can overcome obstacles using ramps. We further provide evidence that multi-agent competition may scale better with increasing environment complexity and leads to behavior that centers around far more human-relevant skills than other self-supervised reinforcement learning methods such as intrinsic motivation. Finally, we propose transfer and fine-tuning as a way to quantitatively evaluate targeted capabilities, and we compare hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.

PDF Abstract

Emergent Tool Use From Multi-Agent Autocurricula

The paper, "Emergent Tool Use From Multi-Agent Autocurricula," explores the development of complex, human-relevant skills through multi-agent reinforcement learning (MARL) in a competitive setting. The research demonstrates that agents can create sophisticated strategies, including tool use, through an iterative process of self-supervised learning within a hide-and-seek environment.

Research Context and Objectives

The paper recognizes the challenge of designing artificial agents capable of solving diverse, real-world tasks. Traditional reinforcement learning approaches necessitate manually specified tasks and reward functions, which can be limiting and costly. Instead, the paper investigates a method where agents develop skills autonomously through competition and adaptation, drawing inspiration from evolutionary processes.

Methodological Approach

The authors construct a physics-based environment for a game of hide-and-seek, in which agents are rewarded based on the visibility of hiders by seekers. The environment's complexity arises from moveable objects, such as boxes and ramps, which agents can manipulate. This setup allows for the emergence of dynamic strategies as teams compete: hiders build shelters, and seekers develop counter-strategies using ramps.

The MARL framework employs multi-agent self-play, using Proximal Policy Optimization with Generalized Advantage Estimation for policy training. Agents share a centralized value function during training, enabling them to optimize strategies without explicit coordination or communication frameworks.

Emergent Behavior and Results

The research identifies six distinct phases of emergent strategy, demonstrating the capability of MARL to foster complex adaptive behavior. These phases illustrate a progression from simple evasion tactics to advanced tool usage, such as using boxes to construct forts and ramps to breach barriers.

Quantitatively, the authors provide evidence of significant behavioral shifts, measured by object movements and locking actions. The paper further conducts ablation experiments to explore the robustness of emergent behaviors, emphasizing the role of environment randomization and model scale.

Evaluation and Comparison

The paper presents a comparative analysis between MARL-driven exploration and intrinsic motivation methods, illustrating that multi-agent environments could lead to qualitatively richer exploration in complex, high-dimensional spaces. To address evaluation challenges, a suite of target intelligence tests is proposed to measure capabilities such as memory, navigation, and construction. Pretrained hide-and-seek agents generally outperform baselines in these tasks, indicating beneficial skill transfer.

Implications and Future Directions

The work underscores the potential of self-supervised multi-agent systems to produce complex, human-relevant skills without direct task specification. This has implications for the development of autonomous systems capable of learning in unstructured, open-ended environments. Future work might focus on improving sample efficiency and mitigating undesired exploitative behaviors within the training environment.

The paper contributes to the field of AI by demonstrating the efficacy of leveraging multi-agent interactions to replicate aspects of natural evolution and strategy development, providing a promising pathway toward scalable and adaptable artificial intelligence systems.