Self-adaptive PSRO: Towards an Automatic Population-based Game Solver (2404.11144v1)
Abstract: Policy-Space Response Oracles (PSRO) as a general algorithmic framework has achieved state-of-the-art performance in learning equilibrium policies of two-player zero-sum games. However, the hand-crafted hyperparameter value selection in most of the existing works requires extensive domain knowledge, forming the main barrier to applying PSRO to different games. In this work, we make the first attempt to investigate the possibility of self-adaptively determining the optimal hyperparameter values in the PSRO framework. Our contributions are three-fold: (1) Using several hyperparameters, we propose a parametric PSRO that unifies the gradient descent ascent (GDA) and different PSRO variants. (2) We propose the self-adaptive PSRO (SPSRO) by casting the hyperparameter value selection of the parametric PSRO as a hyperparameter optimization (HPO) problem where our objective is to learn an HPO policy that can self-adaptively determine the optimal hyperparameter values during the running of the parametric PSRO. (3) To overcome the poor performance of online HPO methods, we propose a novel offline HPO approach to optimize the HPO policy based on the Transformer architecture. Experiments on various two-player zero-sum games demonstrate the superiority of SPSRO over different baselines.
- Optuna: A next-generation hyperparameter optimization framework. In KDD, pages 2623–2631, 2019.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Open-ended learning in symmetric zero-sum games. In ICML, pages 434–443, 2019.
- Collaborative hyperparameter tuning. In ICML, pages 199–207, 2013.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
- George W Brown. Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation, 13(1):374, 1951.
- François Charton. Linear algebra with transformers. arXiv preprint arXiv:2112.01898, 2021.
- Learning to learn without gradient descent by gradient descent. In ICML, pages 748–756, 2017.
- Decision transformer: Reinforcement learning via sequence modeling. In NeurIPS, pages 15084–15097, 2021.
- Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021.
- Towards learning universal hyperparameter optimizers with transformers. In NeurIPS, pages 32053–32068, 2022.
- RL22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
- Neural auto-curricula in two-player zero-sum games. In NeurIPS, pages 3504–3517, 2021.
- Initializing Bayesian hyperparameter optimization via meta-learning. In AAAI, pages 1128–1135, 2015.
- Local convergence analysis of gradient descent ascent with finite timescale separation. In ICLR, 2021.
- Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121, 2016.
- Iterated dominance and iterated best response in experimental “p-beauty contests”. The American Economic Review, 88(4):947–969, 1998.
- Security scheduling for real-world networks. In AAMAS, pages 215–222, 2013.
- Offline reinforcement learning as one big sequence modeling problem. In NeurIPS, pages 1273–1286, 2021.
- Strategy exploration in empirical games. In AAMAS, pages 1131–1138, 2010.
- Contextual Gaussian process bandit optimization. In NeurIPS, pages 2447–2455, 2011.
- Deep learning for symbolic mathematics. In ICLR, 2019.
- A unified game-theoretic approach to multiagent reinforcement learning. In NeurIPS, pages 4193–4206, 2017.
- OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453, 2019.
- Multi-game decision transformers. In NeurIPS, pages 27921–27936, 2022.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097, 2022.
- Towards unifying behavioral and response diversity for open-ended learning in zero-sum games. In NeurIPS, pages 941–952, 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021.
- Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers. In ICML, pages 7480–7491, 2021.
- Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. The Knowledge Engineering Review, 27(1):1–31, 2012.
- Pipeline PSRO: A scalable approach for finding approximate Nash equilibria in large games. In NeurIPS, pages 20238–20248, 2020.
- Planning in the presence of cost functions controlled by an adversary. In ICML, pages 536–543, 2003.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- A generalized training approach for multiagent learning. In ICLR, 2020.
- Learning equilibria in mean-field games: Introducing mean-field PSRO. In AAMAS, pages 926–934, 2022.
- Transformers can do Bayesian inference. In ICLR, 2022.
- α𝛼\alphaitalic_α-rank: Multi-agent evaluation by evolution. Scientific Reports, 9(1):9937, 2019.
- Modelling behavioural diversity for learning in open-ended games. In ICML, pages 8514–8524, 2021.
- Julia Robinson. An iterative method of solving a game. Annals of Mathematics, pages 296–301, 1951.
- PACOH: Bayes-optimal meta-learning with PAC-guarantees. In ICML, pages 9116–9126, 2021.
- Exploring large strategy spaces in empirical game modeling. Agent Mediated Electronic Commerce (AMEC 2009), page 139, 2009.
- Stronger CDA strategies through empirical game-theoretic analysis and reinforcement learning. In AAMAS, pages 249–256, 2009.
- Iterative empirical game solving via single policy best response. In ICLR, 2021.
- Practical Bayesian optimization of machine learning algorithms. In NeurIPS, pages 2951–2959, 2012.
- Multi-task Bayesian optimization. In NeurIPS, pages 2004–2012, 2013.
- LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
- Meta-learning acquisition functions for transfer learning in Bayesian optimization. arXiv preprint arXiv:1904.02642, 2019.
- Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
- Deep reinforcement learning for green security games with real-time information. In AAAI, pages 1401–1408, 2019.
- Evaluating strategy exploration in empirical game-theoretic analysis. In AAMAS, pages 1346–1354, 2022.
- Michael P Wellman. Methods for empirical game-theoretic analysis. In AAAI, volume 980, pages 1552–1556, 2006.
- Multi-agent reinforcement learning is a sequence modeling problem. arXiv preprint arXiv:2205.14953, 2022.
- Few-shot Bayesian optimization with deep kernel surrogates. In ICLR, 2020.
- Prompting decision transformer for few-shot policy generalization. In ICML, pages 24631–24645, 2022.
- ααsuperscript𝛼𝛼\alpha^{\alpha}italic_α start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT-Rank: Practically scaling α𝛼\alphaitalic_α-Rank through stochastic optimisation. In AAMAS, pages 1575–1583, 2020.
- Tokens-to-token ViT: Training vision transformers from scratch on ImageNet. In ICCV, pages 558–567, 2021.
- A self-tuning actor-critic algorithm. In NeurIPS, volume 33, pages 20913–20924, 2020.
- Point transformer. In ICCV, pages 16259–16268, 2021.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
- Online decision transformer. In ICML, pages 27042–27059, 2022.