Human-level performance in first-person multiplayer games with population-based deep reinforcement learning (1807.01281v1)

Published 3 Jul 2018 in cs.LG, cs.AI, and stat.ML

Abstract: Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent in the population learns its own internal reward signal to complement the sparse delayed reward from winning, and selects actions using a novel temporally hierarchical representation that enables the agent to reason at multiple timescales. During game-play, these agents display human-like behaviours such as navigating, following, and defending based on a rich learned representation that is shown to encode high-level game knowledge. In an extensive tournament-style evaluation the trained agents exceeded the win-rate of strong human players both as teammates and opponents, and proved far stronger than existing state-of-the-art agents. These results demonstrate a significant jump in the capabilities of artificial agents, bringing us closer to the goal of human-level intelligence.

PDF Abstract

Human-level Performance in Multiplayer Games Through Population-based Deep RL

The paper "Human-level performance in first-person multiplayer games with population-based deep reinforcement learning" addresses the challenge of creating artificial agents capable of competing at human levels in complex, multi-agent environments. The research focuses on a variant of the game Quake III Arena, specifically the Capture the Flag (CTF) mode, where multiple agents must learn concurrently to cooperate and compete in randomly generated environments.

Methodology

The authors employ a novel two-tiered optimization process using population-based deep reinforcement learning (RL) to train a collection of agents. This setup involves:

Concurrent Training: Agents are trained simultaneously across thousands of parallel game simulations, enabling a diverse array of interactions and experiences.
Internal Reward Systems: Each agent learns an individual reward signal that supplements the sparse winning rewards, promoting reliable skill acquisition over time.
Hierarchical Temporal Representation: A new temporally hierarchical model lets agents make decisions over various timescales, optimizing short-term actions with long-term strategies.

These agents, drawing input solely from raw pixels and game points, eventually demonstrate strategic behaviors typically associated with human players, such as navigation and defensive tactics.

Results

The paper reports strong performance where trained agents surpassed the win rates of human players in zero-shot generalization scenarios across procedurally generated maps. In controlled tournaments under diverse game conditions, agents achieved a high Elo rating, suggesting superior strategic play compared to human counterparts. Moreover, these agents demonstrated compatibility when paired with new teammates—including humans—showing an ability to adapt to unknown strategies and players.

Implications

This research advances the understanding of RL in high-dimensional, multi-agent environments. By employing population-based training and internal reward structures, the authors provide a framework that addresses critical issues in multi-agent RL: stability, scalability, and generalization. This approach has implications for various domains where autonomous systems must operate collaboratively or competitively without predefined models or human guidance.

Future Developments

The findings prompt several areas for future exploration:

Population Diversity: Developing techniques to maintain and enrich diversity within agent populations could enhance learning adaptability and robustness.
Meta-Optimization: Refining meta-optimization strategies such as Population Based Training (PBT) for more efficient exploration-exploration trade-offs.
Temporal Credit Assignment: Improving methods for more precise temporal credit assignment could further enhance learning rate and efficiency.

Overall, this work contributes to bridging the gap towards achieving human-level intelligence in artificial agents by leveraging a sophisticated architecture and training approach in complex, multi-agent settings. The techniques and insights derived could potentially be applied across other competitive, dynamic, and cooperative domains beyond gaming.

PDF Markdown Bookmark Chat (Pro)

Authors (18)

Max Jaderberg (26 papers)
Wojciech M. Czarnecki (15 papers)
Iain Dunning (10 papers)
Luke Marris (23 papers)
Guy Lever (18 papers)
Antonio Garcia Castaneda (4 papers)
Charles Beattie (8 papers)
Neil C. Rabinowitz (11 papers)
Ari S. Morcos (31 papers)
Avraham Ruderman (6 papers)
Nicolas Sonnerat (10 papers)
Tim Green (7 papers)
Louise Deason (1 paper)
Joel Z. Leibo (70 papers)
David Silver (67 papers)
Demis Hassabis (41 papers)
Koray Kavukcuoglu (57 papers)
Thore Graepel (48 papers)

Citations (696)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos