Extensive-Form Game Solving via Blackwell Approachability on Treeplexes (2403.04680v1)
Abstract: In this paper, we introduce the first algorithmic framework for Blackwell approachability on the sequence-form polytope, the class of convex polytopes capturing the strategies of players in extensive-form games (EFGs). This leads to a new class of regret-minimization algorithms that are stepsize-invariant, in the same sense as the Regret Matching and Regret Matching$+$ algorithms for the simplex. Our modular framework can be combined with any existing regret minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs with perfect recall, through the self-play framework. Leveraging predictive online mirror descent, we introduce Predictive Treeplex Blackwell$+$ (PTB$+$), and show a $O(1/\sqrt{T})$ convergence rate to Nash equilibrium in self-play. We then show how to stabilize PTB$+$ with a stepsize, resulting in an algorithm with a state-of-the-art $O(1/T)$ convergence rate. We provide an extensive set of experiments to compare our framework with several algorithmic benchmarks, including CFR$+$ and its predictive variant, and we highlight interesting connections between practical performance and the stepsize-dependence or stepsize-invariance properties of classical algorithms.
- Blackwell approachability and no-regret learning are equivalent. In Proceedings of the 24th Annual Conference on Learning Theory, pages 27–46. JMLR Workshop and Conference Proceedings, 2011.
- David Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6(1):1–8, 1956.
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
- Superhuman AI for multiplayer poker. Science, 365(6456):885–890, 2019.
- Dynamic thresholding and pruning for regret minimization. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Revisiting CFR+ and alternating updates. Journal of Artificial Intelligence Research, 64:429–443, 2019.
- Block-coordinate methods and restarting for solving extensive-form games. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Optimistic and adaptive lagrangian hedging. In Thirty-fifth AAAI conference on artificial intelligence, 2021.
- Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
- Online convex optimization for sequential decision processes and extensive-form games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1917–1925, 2019a.
- Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In Advances in Neural Information Processing Systems, pages 5222–5232, 2019b.
- Better regularization for sequential decision spaces fast convergence rates for Nash, correlated, and team equilibria. In EC’21: Proceedings of the 22nd ACM Conference on Economics and Computation, 2021a.
- Faster game solving via predictive Blackwell approachability: Connecting regret matching and mirror descent. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 2021b.
- Near-optimal no-regret learning dynamics for general convex games. Advances in Neural Information Processing Systems, 35:39076–39089, 2022.
- Regret matching+: (in)stability and fast convergence in games. In Advances in Neural Information Processing Systems, 2023.
- Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79–103, 1999.
- First-order algorithm with convergence for-equilibrium in two-person zero-sum games. Mathematical programming, 133(1-2):279–298, 2012.
- Geoffrey J Gordon. No-regret algorithms for online convex programs. In Advances in Neural Information Processing Systems, pages 489–496. Citeseer, 2007.
- Solving optimization problems with Blackwell approachability. Mathematics of Operations Research, 2023.
- Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research, 35(2):494–512, 2010.
- A modular analysis of adaptive (non-) convex optimization: Optimism, composite objectives, and variational bounds. In International Conference on Algorithmic Learning Theory, pages 681–720. PMLR, 2017.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations, ICLR, 2015.
- Solving large sequential games with the excessive gap technique. In Advances in Neural Information Processing Systems, pages 864–874, 2018.
- Faster algorithms for extensive-form game solving via improved smoothing functions. Mathematical Programming, pages 1–33, 2020.
- Last-iterate convergence in extensive-form games. Advances in Neural Information Processing Systems, 34:14293–14305, 2021.
- Emanuel Milman. Approachable sets of vector payoffs in stochastic games. Games and Economic Behavior, 56(1):135–147, 2006.
- Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
- Arkadi Nemirovski. Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
- Yurii Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization, 16(1):235–249, 2005.
- Online learning via offline greedy algorithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 737–738, 2021.
- Vianney Perchet. Approachability, Calibration and Regret in Games with Partial Observations. PhD thesis, PhD thesis, Université Pierre et Marie Curie, 2010.
- Online learning with predictable sequences. In Conference on Learning Theory, pages 993–1019. PMLR, 2013.
- On the convergence of Adam and beyond. International Conference on Learning Representations (ICLR), 2018.
- Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems, 28, 2015.
- Solving heads-up limit Texas hold’em. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
- Bernhard von Stengel. Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220–246, 1996.
- Regret minimization in games with incomplete information. In Advances in neural information processing systems, pages 1729–1736, 2007.