Faster Game Solving via Hyperparameter Schedules (2404.09097v1)
Abstract: The counterfactual regret minimization (CFR) family of algorithms consists of iterative algorithms for imperfect-information games. In two-player zero-sum games, the time average of the iterates converges to a Nash equilibrium. The state-of-the-art prior variants, Discounted CFR (DCFR) and Predictive CFR$+$ (PCFR$+$) are the fastest known algorithms for solving two-player zero-sum games in practice, both in the extensive-form setting and the normal-form setting. They enhance the convergence rate compared to vanilla CFR by applying discounted weights to early iterations in various ways, leveraging fixed weighting schemes. We introduce Hyperparameter Schedules (HSs), which are remarkably simple yet highly effective in expediting the rate of convergence. HS dynamically adjusts the hyperparameter governing the discounting scheme of CFR variants. HSs on top of DCFR or PCFR$+$ is now the new state of the art in solving zero-sum games and yields orders-of-magnitude speed improvements. The new algorithms are also easy to implement because 1) they are small modifications to the existing ones in terms of code and 2) they require no game-specific tuning.
- Blackwell, D. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, pp. 1–8, 1956.
- Heads-up limit hold’em poker is solved. Science, pp. 145–149, 2015.
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, pp. 418–424, 2018.
- Solving imperfect-information games via discounted regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 2019a.
- Superhuman AI for multiplayer poker. Science, pp. 885–890, 2019b.
- Prediction, learning, and games. Cambridge University Press, 2006.
- Correlation in extensive-form games: Saddle-point formulation and benchmarks. Conference on Neural Information Processing Systems (NeurIPS), 2019.
- Faster game solving via predictive Blackwell approachability: Connecting regret matching and mirror descent. In AAAI Conference on Artificial Intelligence (AAAI), 2021.
- General Blotto: games of allocative strategic mismatch. Public Choice, 2009.
- A simple adaptive procedure leading to correlated equilibrium. Econometrica, pp. 1127–1150, 2000.
- Kuhn, H. W. A simplified two-person poker. Contributions to the Theory of Games, pp. 97–103, 1950.
- Monte Carlo sampling for regret minimization in extensive games. Conference on Neural Information Processing Systems (NeurIPS), 2009.
- Openspiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453, 2019.
- Online Monte Carlo counterfactual regret minimization for search in imperfect information games. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2015.
- ESCHER: Eschewing importance sampling in games by computing a history value function to estimate regret. In International Conference on Learning Representations (ICLR), 2023.
- Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, pp. 508–513, 2017.
- Nash, J. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, pp. 48–49, 1950.
- A course in game theory. MIT Press, 1994.
- Ross, S. M. Goofspiel—the game of pure strategy. Journal of Applied Probability, pp. 621–625, 1971.
- Bayes’ bluff: Opponent modelling in poker. In Annual Conference on Uncertainty in Artificial Intelligence (UAI), 2005.
- Steinberger, E. PokerRL. https://github.com/TinkeringCode/PokerRL, 2019.
- Tammelin, O. Solving large imperfect information games using CFR+. arXiv preprint arXiv:1407.5042, 2014.
- AutoCFR: Learning to design counterfactual regret minimization algorithms. In AAAI Conference on Artificial Intelligence (AAAI), 2022.
- Dynamic discounted counterfactual regret minimization. In International Conference on Learning Representations (ICLR), 2024.
- Equilibrium finding in normal-form games via greedy regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 2022.
- Regret minimization in games with incomplete information. Conference on Neural Information Processing Systems (NeurIPS), 2007.
- Naifeng Zhang (7 papers)
- Stephen McAleer (41 papers)
- Tuomas Sandholm (119 papers)