A Generalized Training Approach for Multiagent Learning (1909.12823v2)
Abstract: This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, $\alpha$-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and $\alpha$-Rank. We demonstrate the competitive performance of $\alpha$-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where $\alpha$-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain.
- Paul Muller (25 papers)
- Shayegan Omidshafiei (34 papers)
- Mark Rowland (57 papers)
- Karl Tuyls (58 papers)
- Julien Perolat (37 papers)
- Siqi Liu (94 papers)
- Daniel Hennes (20 papers)
- Luke Marris (23 papers)
- Marc Lanctot (60 papers)
- Edward Hughes (40 papers)
- Zhe Wang (574 papers)
- Guy Lever (18 papers)
- Nicolas Heess (139 papers)
- Thore Graepel (48 papers)
- Remi Munos (45 papers)