Learning to Play No-Press Diplomacy with Best Response Policy Iteration (2006.04635v4)
Abstract: Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
- Thomas Anthony (16 papers)
- Tom Eccles (18 papers)
- Andrea Tacchetti (26 papers)
- János Kramár (19 papers)
- Ian Gemp (36 papers)
- Thomas C. Hudson (1 paper)
- Nicolas Porcel (3 papers)
- Marc Lanctot (60 papers)
- Richard Everett (15 papers)
- Roman Werpachowski (4 papers)
- Satinder Singh (80 papers)
- Thore Graepel (48 papers)
- Yoram Bachrach (43 papers)
- Julien Pérolat (10 papers)