RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning (2403.04344v1)
Abstract: Effective action abstraction is crucial in tackling challenges associated with large action spaces in Imperfect Information Extensive-Form Games (IIEFGs). However, due to the vast state space and computational complexity in IIEFGs, existing methods often rely on fixed abstractions, resulting in sub-optimal performance. In response, we introduce RL-CFR, a novel reinforcement learning (RL) approach for dynamic action abstraction. RL-CFR builds upon our innovative Markov Decision Process (MDP) formulation, with states corresponding to public information and actions represented as feature vectors indicating specific action abstractions. The reward is defined as the expected payoff difference between the selected and default action abstractions. RL-CFR constructs a game tree with RL-guided action abstractions and utilizes counterfactual regret minimization (CFR) for strategy derivation. Impressively, it can be trained from scratch, achieving higher expected payoff without increased CFR solving time. In experiments on Heads-up No-limit Texas Hold'em, RL-CFR outperforms ReBeL's replication and Slumbot, demonstrating significant win-rate margins of $64\pm 11$ and $84\pm 17$ mbb/hand, respectively.
- Peter A. Streufert. A category for extensive-form games. CoRR, abs/2105.11398, 2021.
- Deepstack: Expert-level artificial intelligence in no-limit poker. CoRR, abs/1701.01724, 2017.
- Shiheng Wang. Cfr-p: Counterfactual regret minimization with hierarchical policy abstraction, and its application to two-player mahjong. CoRR, abs/2307.12087, 2023.
- Player of games. 2021.
- John Nash. 5. Equilibrium Points in n-Person Games. 01 2002.
- Regret minimization in games with incomplete information. In John C. Platt, Daphne Koller, Yoram Singer, and Sam T. Roweis, editors, Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007, pages 1729–1736. Curran Associates, Inc., 2007.
- Monte carlo sampling for regret minimization in extensive games. In Yoshua Bengio, Dale Schuurmans, John D. Lafferty, Christopher K. I. Williams, and Aron Culotta, editors, Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada, pages 1078–1086. Curran Associates, Inc., 2009.
- Oskari Tammelin. Solving large imperfect information games using CFR+. CoRR, abs/1407.5042, 2014.
- Deep counterfactual regret minimization. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 793–802. PMLR, 2019.
- Solving imperfect-information games via discounted regret minimization. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 1829–1836. AAAI Press, 2019.
- Probabilistic state translation in extensive games with large action sets. In Craig Boutilier, editor, IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009, pages 278–284, 2009.
- Luca Aceto. Action refinement in process algebras. PhD thesis, University of Sussex, Falmer, East Sussex, UK, 1991.
- Combining deep reinforcement learning and search for imperfect-information games. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Automated action abstraction of imperfect information extensive-form games. In Wolfram Burgard and Dan Roth, editors, Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7-11, 2011. AAAI Press, 2011.
- Using sliding windows to generate action abstractions in extensive-form games. In Jörg Hoffmann and Bart Selman, editors, Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada. AAAI Press, 2012.
- Regret transfer and parameter optimization. In Carla E. Brodley and Peter Stone, editors, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pages 594–601. AAAI Press, 2014.
- Noam Brown. Equilibrium Finding for Large Adversarial Imperfect-Information Games. PhD thesis, Carnegie Mellon University, 2020.
- Mark Humphreys. Action selection methods using reinforcement learning. PhD thesis, University of Cambridge, UK, 1997.
- Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5):1054, 1998.
- Mastering the game of go with deep neural networks and tree search. Nat., 529(7587):484–489, 2016.
- Modular architecture for starcraft II with deep reinforcement learning. In Jonathan P. Rowe and Gillian Smith, editors, Proceedings of the Fourteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2018, November 13-17, 2018, Edmonton, Canada, pages 187–193. AAAI Press, 2018.
- Dota 2 with large scale deep reinforcement learning. CoRR, abs/1912.06680, 2019.
- Abraham Neyman. Existence of optimal strategies in markov games with incomplete information. Int. J. Game Theory, 37(4):581–596, 2008.
- Continuous control with deep reinforcement learning. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
- Neil Burch. Time and Space: Why Imperfect Information Games are Hard. PhD thesis, University of Alberta, 2017.
- Martijn van Otterlo and Marco A. Wiering. Reinforcement learning and markov decision processes. In Marco A. Wiering and Martijn van Otterlo, editors, Reinforcement Learning, volume 12 of Adaptation, Learning, and Optimization, pages 3–42. Springer, 2012.
- Actor-critic algorithms. In Sara A. Solla, Todd K. Leen, and Klaus-Robert Müller, editors, Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29 - December 4, 1999], pages 1008–1014. The MIT Press, 1999.
- The annual computer poker competition. AI Mag., 34(2):112, 2013.
- Computer poker: A review. Artif. Intell., 175(5-6):958–987, 2011.
- Michael Johanson. Measuring the size of large no-limit poker games. CoRR, abs/1302.7008, 2013.
- Eric Jackson. Slumbot nl: Solving large games with counterfactual regret minimization using sampling and distributed processing. In AAAI Workshop on Computer Poker and Imperfect Information, 2013.
- CFR-D: solving imperfect information games using decomposition. CoRR, abs/1303.4441, 2013.
- Prediction, learning, and games. Cambridge University Press, 2006.
- Smoothing techniques for computing nash equilibria of sequential games. Math. Oper. Res., 35(2):494–512, 2010.
- Extensive-form game abstraction with bounds. In Moshe Babaioff, Vincent Conitzer, and David A. Easley, editors, ACM Conference on Economics and Computation, EC ’14, Stanford , CA, USA, June 8-12, 2014, pages 621–638. ACM, 2014.
- A unified framework for extensive-form game abstraction with bounds. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 613–624, 2018.
- Discretization of continuous action spaces in extensive-form games. In Gerhard Weiss, Pinar Yolum, Rafael H. Bordini, and Edith Elkind, editors, Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015, Istanbul, Turkey, May 4-8, 2015, pages 47–56. ACM, 2015.
- Abstraction pathologies in extensive games. In Carles Sierra, Cristiano Castelfranchi, Keith S. Decker, and Jaime Simão Sichman, editors, 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, May 10-15, 2009, Volume 2, pages 781–788. IFAAMAS, 2009.
- B. Chen and J. Ankenman. The mathematics of poker. 2007.
- Solving games with functional regret estimation. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 2138–2145. AAAI Press, 2015.
- Alternative function approximation parameterizations for solving games: An analysis of f𝑓fitalic_f-regression counterfactual regret minimization. In Amal El Fallah Seghrouchni, Gita Sukthankar, Bo An, and Neil Yorke-Smith, editors, Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020, pages 339–347. International Foundation for Autonomous Agents and Multiagent Systems, 2020.
- Deep reinforcement learning from self-play in imperfect-information games. CoRR, abs/1603.01121, 2016.
- From poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8525–8535. PMLR, 2021.
- The power of regularization in solving extensive-form games. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
- An efficient deep reinforcement learning algorithm for solving imperfect information extensive-form games. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 5823–5831. AAAI Press, 2023.
- Anonymous. Dynamic discounted counterfactual regret minimization. In The Twelfth International Conference on Learning Representations, 2024.
- Solving imperfect information games using decomposition. In Carla E. Brodley and Peter Stone, editors, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pages 602–608. AAAI Press, 2014.
- Monte carlo continual resolving for online strategy computation in imperfect information games. In Edith Elkind, Manuela Veloso, Noa Agmon, and Matthew E. Taylor, editors, Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019, pages 224–232. International Foundation for Autonomous Agents and Multiagent Systems, 2019.
- Accelerating best response calculation in large extensive games. In Toby Walsh, editor, IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pages 258–265. IJCAI/AAAI, 2011.
- Decentralized stochastic control with partial history sharing: A common information approach. IEEE Transactions on Automatic Control, 58(7):1644–1658, 2013.
- Frans Adriaan Oliehoek. Sufficient plan-time statistics for decentralized pomdps. In Francesca Rossi, editor, IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013, pages 302–308. IJCAI/AAAI, 2013.
- Optimally solving dec-pomdps as continuous-state mdps. J. Artif. Intell. Res., 55:443–497, 2016.
- Value functions for depth-limited solving in zero-sum imperfect-information games. CoRR, abs/1906.06412, 2019.
- Depth-limited solving for imperfect-information games. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 7674–7685, 2018.
- Unlocking the potential of deep counterfactual value networks. CoRR, abs/2007.10442, 2020.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit texas hold’em agent. In Sam Ganzfried, editor, Computer Poker and Imperfect Information, Papers from the 2015 AAAI Workshop, Austin, Texas, USA, January 26, 2015, volume WS-15-07 of AAAI Technical Report. AAAI Press, 2015.
- Libratus: The superhuman AI for no-limit poker. In Carles Sierra, editor, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pages 5226–5228. ijcai.org, 2017.
- AIVAT: A new variance reduction technique for agent evaluation in imperfect information games. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 949–956. AAAI Press, 2018.
- Safe and nested subgame solving for imperfect-information games. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 689–699, 2017.
- Heads-up limit hold’em poker is solved. Commun. ACM, 60(11):81–88, 2017.
- N. Brown and T. Sandholm. Superhuman ai for multiplayer poker. Science, 365(6456):eaay2400, 2019.
- Finding optimal abstract strategies in extensive-form games. In Jörg Hoffmann and Bart Selman, editors, Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada. AAAI Press, 2012.
- Potential-aware imperfect-recall abstraction with earth mover’s distance in imperfect-information games. In Carla E. Brodley and Peter Stone, editors, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pages 682–690. AAAI Press, 2014.
- Strategy-based warm starting for regret minimization in games. In Dale Schuurmans and Michael P. Wellman, editors, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, pages 432–438. AAAI Press, 2016.
- Johannes Heinrich. Reinforcement learning from self-play in imperfect-information games. PhD thesis, University College London, UK, 2017.
- Alphaholdem: High-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 4689–4697. AAAI Press, 2022.
- A simple adaptive procedure leading to correlated equilibrium. Game Theory and Information, 1997.
- Evaluating state-space abstractions in extensive-form games. In Maria L. Gini, Onn Shehory, Takayuki Ito, and Catholijn M. Jonker, editors, International conference on Autonomous Agents and Multi-Agent Systems, AAMAS ’13, Saint Paul, MN, USA, May 6-10, 2013, pages 271–278. IFAAMAS, 2013.
- Action translation in extensive-form games with large action spaces: Axioms, paradoxes, and the pseudo-harmonic mapping. In Francesca Rossi, editor, IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013, pages 120–128. IJCAI/AAAI, 2013.
- Endgame solving in large imperfect-information games. In Gerhard Weiss, Pinar Yolum, Rafael H. Bordini, and Edith Elkind, editors, Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015, Istanbul, Turkey, May 4-8, 2015, pages 37–45. ACM, 2015.
- Peter J. Huber. Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1):73 – 101, 1964.