Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Population Learning beyond Symmetric Zero-sum Games (2401.05133v1)

Published 10 Jan 2024 in cs.AI and cs.MA

Abstract: We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game. We show empirical convergence in a suite of OpenSpiel games, validated rigorously by exact game solvers. We then deploy NeuPL-JPSRO to complex domains, where our approach enables adaptive coordination in a MuJoCo control domain and skill transfer in capture-the-flag. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality, paving the way towards solving real-world games between heterogeneous players with mixed motives.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Maximum a Posteriori Policy Optimisation. In International Conference on Learning Representations. https://openreview.net/forum?id=S1ANxQW0b
  2. Robert J Aumann. 1974. Subjectivity and correlation in randomized strategies. Journal of mathematical Economics 1, 1 (1974), 67–96.
  3. Emergent Complexity via Multi-Agent Competition. In International Conference on Learning Representations. https://openreview.net/forum?id=Sy0GnUxCb
  4. Siddharth Barman and Katrina Ligett. 2015. Finding any nontrivial coarse correlated equilibrium is hard. ACM SIGecom Exchanges 14, 1 (2015), 76–79.
  5. George W. Brown. 1951. Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation (1951). arXiv:13(1):374–376 [cs.LG]
  6. Noam Brown and Tuomas Sandholm. 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359, 6374 (2018), 418–424.
  7. Deep blue. Artificial intelligence 134, 1-2 (2002), 57–83.
  8. Nicolo Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, learning, and games. Cambridge university press.
  9. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems 34 (2021), 27604–27616.
  10. The complexity of computing a Nash equilibrium. SIAM J. Comput. 39, 1 (2009), 195–259.
  11. Correlation in extensive-form games: Saddle-point formulation and benchmarks. Advances in Neural Information Processing Systems 32 (2019).
  12. Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity. In AAMAS.
  13. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859–865.
  14. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
  15. Harold W Kuhn. 1950. A simplified two-person poker. Contributions to the Theory of Games 1 (1950), 97–103.
  16. H. W. Kuhn and AW Tucker. 1957. Extensive games and the problem and information. Contributions to the Theory of Games, II, Annals of Mathematical Studies 28 (1957), 193–216.
  17. OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019). arXiv:1908.09453 [cs.LG] http://arxiv.org/abs/1908.09453
  18. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3323fe11e9595c09af38fe67567a9394-Paper.pdf
  19. Scalable evaluation of multi-agent reinforcement learning with Melting Pot. In International Conference on Machine Learning. PMLR, 6187–6199.
  20. Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 13793–13806. https://proceedings.mlr.press/v162/liu22h.html
  21. Emergent Coordination Through Competition. In International Conference on Learning Representations. https://openreview.net/forum?id=BkG8sjR5Km
  22. From motor control to team play in simulated humanoid football. Science Robotics 7, 69 (2022), eabo0235.
  23. From Motor Control to Team Play in Simulated Humanoid Football. Science robotics 7 69 (2021), eabo0235. https://api.semanticscholar.org/CorpusID:235195692
  24. NeuPL: Neural Population Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=MIX3fJkl_1
  25. Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=RczPtvlaXPH
  26. Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 7480–7491. http://proceedings.mlr.press/v139/marris21a.html
  27. Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 20238–20248. https://proceedings.neurips.cc/paper/2020/file/e9bcd1b063077573285ae1a41025f5dc-Paper.pdf
  28. Planning in the presence of cost functions controlled by an adversary. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning (Washington, DC, USA, 2003-08-21) (ICML’03). AAAI Press, 536–543.
  29. Hervé Moulin and J-P Vial. 1978. Strategically zero-sum games: the class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory 7, 3 (1978), 201–221.
  30. A Generalized Training Approach for Multiagent Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkl5kxrKDr
  31. Safe and efficient off-policy reinforcement learning. Advances in neural information processing systems 29 (2016).
  32. John Nash. 1951. Non-Cooperative Games. Annals of Mathematics 54, 2 (1951), 286–295. http://www.jstor.org/stable/1969529
  33. α𝛼\alphaitalic_α-rank: Multi-agent evaluation by evolution. Scientific reports 9, 1 (2019), 1–29.
  34. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
  35. Mastering the game of Stratego with model-free multiagent reinforcement learning. Science 378, 6623 (2022), 990–996.
  36. From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization. In International Conference on Machine Learning. PMLR, 8525–8535.
  37. Sheldon M Ross. 1971. Goofspiel—the game of pure strategy. Journal of Applied Probability 8, 3 (1971), 621–625.
  38. Arthur L Samuel. 1967. Some studies in machine learning using the game of checkers. II—Recent progress. IBM Journal of research and development 11, 6 (1967), 601–617.
  39. Progress & compress: A scalable framework for continual learning. In International Conference on Machine Learning. PMLR, 4528–4537.
  40. Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach. arXiv preprint arXiv:2204.10256 (2022).
  41. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (2018), 1140–1144.
  42. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354–359.
  43. Iterative Empirical Game Solving via Single Policy Best Response. In International Conference on Learning Representations.
  44. Bayes’ bluff: opponent modelling in poker. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence. 550–558.
  45. A parameterized family of equilibrium profiles for three-player kuhn poker.. In AAMAS, Vol. 13. 247–254.
  46. dm_control: Software and Tasks for Continuous Control. Softw. Impacts 6 (2020), 100022. https://api.semanticscholar.org/CorpusID:219980295
  47. Gerald Tesauro et al. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38, 3 (1995), 58–68.
  48. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Siqi Liu (94 papers)
  2. Luke Marris (23 papers)
  3. Marc Lanctot (60 papers)
  4. Georgios Piliouras (130 papers)
  5. Joel Z. Leibo (70 papers)
  6. Nicolas Heess (139 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com