Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games (2006.08555v2)

Published 15 Jun 2020 in cs.GT, cs.AI, cs.LG, and cs.MA

Abstract: Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots. Experiment code is available athttps://github.com/JBLanier/pipeline-psro.

Authors (4)

Stephen McAleer (41 papers)
John Lanier (5 papers)
Roy Fox (39 papers)
Pierre Baldi (89 papers)

Citations (69)

View on Semantic Scholar

Summary

The paper introduces Pipeline PSRO, a novel parallel approach that accelerates approximate Nash equilibria computation in large imperfect-information games.
It leverages a pipeline architecture to enable concurrent reinforcement learning, proving effective in high-complexity games like Barrage Stratego.
Empirical results reveal a 71% win rate against top bots and faster convergence compared to prior PSRO variants, underscoring its practical impact.

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

The paper "Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games" addresses an enduring challenge in the fields of artificial intelligence and game theory: efficiently finding approximate Nash equilibria in large zero-sum imperfect-information games. The conventional Policy Space Response Oracles (PSRO) algorithm, while theoretically robust, is computationally intensive due to its sequential training nature, making it impractical for large-scale applications.

Core Contributions

Introduction of Pipeline PSRO (P2SRO):
- The authors propose Pipeline PSRO (P2SRO), a novel method for parallelizing the PSRO algorithm while preserving convergence guarantees. The hierarchical pipeline architecture of P2SRO allows reinforcement learning agents to train concurrently, leveraging parallelism to expedite the convergence process. Unlike predecessors DCH and Rectified PSRO, P2SRO effectively maintains convergence to an approximate Nash equilibrium across diverse game settings.
Scalability with Game Complexity:
- The proposed P2SRO algorithm is specifically designed to handle games with substantial game tree complexities, such as Barrage Stratego, which exhibits a game tree complexity of approximately $10^{50}$ .
Empirical Validation Across Games:
- From experiments, P2SRO demonstrated superior performance in converging to approximate Nash equilibria both in structured games like Leduc poker and random normal form games. With P2SRO, parallel workforce increases correlated with faster convergence rates.
Performance on Barrage Stratego:
- The paper also documents P2SRO achieving state-of-the-art results in Barrage Stratego, surpassing existing bots, with an average win rate of 71% against the top-performing bots.

Key Findings

Counterexamples for DCH and Rectified PSRO:
- Through theoretical proofs and empirical demonstrations, the paper reveals inherent limitations of DCH and Rectified PSRO, especially in failing to converge even with manageable game sizes.
Fixed and Active Policy Approach:
- P2SRO introduces a bifurcation in policy categorization, distinguishing between fixed and active policies. This setup ensures steady progress towards a more stable Nash equilibrium, with active policies gaining a preparatory advantage through hierarchical learning layers.

Implications and Speculations

Implications for AI and Game Theory:
- The scalability and efficiency brought forth by P2SRO have profound implications for developing AI systems capable of tackling previously intractable problems in domains defined by intricate strategic interactions. This methodology can simplify achieving optimal mixed strategies in complex decision-making environments, transcending traditional artificial intelligence game scenarios.
Future Developments:
- The success of P2SRO invites further exploration into its application across different domains where finding Nash equilibria is critical, including economic modeling, strategic business simulations, and real-time strategy games. Additionally, the principles underlying P2SRO could inspire novel architectures for neural network models aiming at efficient equilibrium computation.

In conclusion, this research advances the field by offering a scalable method that harmonizes the theoretical rigor of PSRO with practical applicability in large-scale game environments, promising diverse applications beyond mere simulated gaming environments.

Related Papers

GitHub

GitHub - JBLanier/pipeline-psro: Official Code Release for Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games (51 stars)