Look-ahead Search on Top of Policy Networks in Imperfect Information Games (2312.15220v3)
Abstract: Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories. Besides the policy network, the algorithm trains an additional critic network, which estimates the expected values of players following various transformations of the policies given by the policy network. These values are then used for depth-limited search. We show how the values from this critic can create a value function for imperfect information games. Moreover, they can be used to compute the summary statistics necessary to start the search from an arbitrary decision point in the game. The presented algorithm is scalable to very large games since it does not require any search during train time. We evaluate the algorithm's performance when trained along Regularized Nash Dynamics, and we evaluate the benefit of using the search in the standard benchmark game of Leduc hold'em, multiple variants of imperfect information Goofspiel, and Battleships.
- Asymmetric abstractions for adversarial settings. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pages 501–508, 2014.
- A demonstration of the polaris poker system. In Adaptive Agents and Multi-Agent Systems, pages 1391–1392, 01 2009.
- Safe and nested subgame solving for imperfect-information games. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
- Superhuman ai for multiplayer poker. Science, 365(6456):885–890, 2019.
- Depth-limited solving for imperfect-information games. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Combining deep reinforcement learning and search for imperfect-information games. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 17057–17069. Curran Associates, Inc., 2020.
- Solving imperfect information games using decomposition. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1), Jun. 2014.
- IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1407–1416. PMLR, 10–15 Jul 2018.
- The advantage regret-matching actor-critic. CoRR, abs/2008.12234, 2020.
- Neural replicator dynamics: Multiagent learning via hedging policy gradients. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’20, page 492–501, Richland, SC, 2020. International Foundation for Autonomous Agents and Multiagent Systems.
- Rethinking formal models of partially observable multiagent decision making. Artificial Intelligence, 303:103645, 2022.
- Value functions for depth-limited solving in zero-sum imperfect-information games. Artificial Intelligence, 314:103805, 2023.
- A unified game-theoretic approach to multiagent reinforcement learning. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- OpenSpiel: A framework for reinforcement learning in games. CoRR, abs/1908.09453, 2019.
- ESCHER: Eschewing importance sampling in games by computing a history value function to estimate regret. In The Eleventh International Conference on Learning Representations, 2023.
- Brendan McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pages 525–533, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR.
- Refining subgames in large imperfect information games. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), Feb. 2016.
- Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
- From poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8525–8535. PMLR, 18–24 Jul 2021.
- Mastering the game of stratego with model-free multiagent reinforcement learning. Science, 378(6623):990–996, 2022.
- Artificial Intelligence: A Modern Approach. Prentice Hall Press, USA, 3rd edition, 2009.
- Student of games: A unified learning algorithm for both perfect and imperfect information games. Science Advances, 9(46):eadg3256, 2023.
- Learning to guess opponent’s information in large partially observable games. In Proc. AAAI Workshop Reinforcement Learning in Games, 2021.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games, 2023.
- DREAM: deep regret minimization with advantage baselines and model-free learning. CoRR, abs/2006.10410, 2020.
- Solving heads-up limit texas hold’em. In Qiang Yang and Michael J. Wooldridge, editors, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 645–652. AAAI Press, 2015.
- Sound algorithms in imperfect information games. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’21, page 1674–1676, Richland, SC, 2021. International Foundation for Autonomous Agents and Multiagent Systems.
- Adversarial policies beat professional-level go AIs, 2023.
- Subgame solving without common knowledge. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 23993–24004. Curran Associates, Inc., 2021.
- Regret minimization in games with incomplete information. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007.