Neural Population Learning beyond Symmetric Zero-sum Games (2401.05133v1)
Abstract: We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game. We show empirical convergence in a suite of OpenSpiel games, validated rigorously by exact game solvers. We then deploy NeuPL-JPSRO to complex domains, where our approach enables adaptive coordination in a MuJoCo control domain and skill transfer in capture-the-flag. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality, paving the way towards solving real-world games between heterogeneous players with mixed motives.
- Maximum a Posteriori Policy Optimisation. In International Conference on Learning Representations. https://openreview.net/forum?id=S1ANxQW0b
- Robert J Aumann. 1974. Subjectivity and correlation in randomized strategies. Journal of mathematical Economics 1, 1 (1974), 67–96.
- Emergent Complexity via Multi-Agent Competition. In International Conference on Learning Representations. https://openreview.net/forum?id=Sy0GnUxCb
- Siddharth Barman and Katrina Ligett. 2015. Finding any nontrivial coarse correlated equilibrium is hard. ACM SIGecom Exchanges 14, 1 (2015), 76–79.
- George W. Brown. 1951. Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation (1951). arXiv:13(1):374–376 [cs.LG]
- Noam Brown and Tuomas Sandholm. 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359, 6374 (2018), 418–424.
- Deep blue. Artificial intelligence 134, 1-2 (2002), 57–83.
- Nicolo Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, learning, and games. Cambridge university press.
- Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems 34 (2021), 27604–27616.
- The complexity of computing a Nash equilibrium. SIAM J. Comput. 39, 1 (2009), 195–259.
- Correlation in extensive-form games: Saddle-point formulation and benchmarks. Advances in Neural Information Processing Systems 32 (2019).
- Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity. In AAMAS.
- Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859–865.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
- Harold W Kuhn. 1950. A simplified two-person poker. Contributions to the Theory of Games 1 (1950), 97–103.
- H. W. Kuhn and AW Tucker. 1957. Extensive games and the problem and information. Contributions to the Theory of Games, II, Annals of Mathematical Studies 28 (1957), 193–216.
- OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019). arXiv:1908.09453 [cs.LG] http://arxiv.org/abs/1908.09453
- A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3323fe11e9595c09af38fe67567a9394-Paper.pdf
- Scalable evaluation of multi-agent reinforcement learning with Melting Pot. In International Conference on Machine Learning. PMLR, 6187–6199.
- Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 13793–13806. https://proceedings.mlr.press/v162/liu22h.html
- Emergent Coordination Through Competition. In International Conference on Learning Representations. https://openreview.net/forum?id=BkG8sjR5Km
- From motor control to team play in simulated humanoid football. Science Robotics 7, 69 (2022), eabo0235.
- From Motor Control to Team Play in Simulated Humanoid Football. Science robotics 7 69 (2021), eabo0235. https://api.semanticscholar.org/CorpusID:235195692
- NeuPL: Neural Population Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=MIX3fJkl_1
- Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=RczPtvlaXPH
- Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 7480–7491. http://proceedings.mlr.press/v139/marris21a.html
- Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 20238–20248. https://proceedings.neurips.cc/paper/2020/file/e9bcd1b063077573285ae1a41025f5dc-Paper.pdf
- Planning in the presence of cost functions controlled by an adversary. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning (Washington, DC, USA, 2003-08-21) (ICML’03). AAAI Press, 536–543.
- Hervé Moulin and J-P Vial. 1978. Strategically zero-sum games: the class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory 7, 3 (1978), 201–221.
- A Generalized Training Approach for Multiagent Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkl5kxrKDr
- Safe and efficient off-policy reinforcement learning. Advances in neural information processing systems 29 (2016).
- John Nash. 1951. Non-Cooperative Games. Annals of Mathematics 54, 2 (1951), 286–295. http://www.jstor.org/stable/1969529
- α𝛼\alphaitalic_α-rank: Multi-agent evaluation by evolution. Scientific reports 9, 1 (2019), 1–29.
- Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
- Mastering the game of Stratego with model-free multiagent reinforcement learning. Science 378, 6623 (2022), 990–996.
- From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization. In International Conference on Machine Learning. PMLR, 8525–8535.
- Sheldon M Ross. 1971. Goofspiel—the game of pure strategy. Journal of Applied Probability 8, 3 (1971), 621–625.
- Arthur L Samuel. 1967. Some studies in machine learning using the game of checkers. II—Recent progress. IBM Journal of research and development 11, 6 (1967), 601–617.
- Progress & compress: A scalable framework for continual learning. In International Conference on Machine Learning. PMLR, 4528–4537.
- Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach. arXiv preprint arXiv:2204.10256 (2022).
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (2018), 1140–1144.
- Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354–359.
- Iterative Empirical Game Solving via Single Policy Best Response. In International Conference on Learning Representations.
- Bayes’ bluff: opponent modelling in poker. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence. 550–558.
- A parameterized family of equilibrium profiles for three-player kuhn poker.. In AAMAS, Vol. 13. 247–254.
- dm_control: Software and Tasks for Continuous Control. Softw. Impacts 6 (2020), 100022. https://api.semanticscholar.org/CorpusID:219980295
- Gerald Tesauro et al. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38, 3 (1995), 58–68.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
- Siqi Liu (94 papers)
- Luke Marris (23 papers)
- Marc Lanctot (60 papers)
- Georgios Piliouras (130 papers)
- Joel Z. Leibo (70 papers)
- Nicolas Heess (139 papers)