Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Extensive-Form Game Solving via Blackwell Approachability on Treeplexes (2403.04680v1)

Published 7 Mar 2024 in cs.GT

Abstract: In this paper, we introduce the first algorithmic framework for Blackwell approachability on the sequence-form polytope, the class of convex polytopes capturing the strategies of players in extensive-form games (EFGs). This leads to a new class of regret-minimization algorithms that are stepsize-invariant, in the same sense as the Regret Matching and Regret Matching$+$ algorithms for the simplex. Our modular framework can be combined with any existing regret minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs with perfect recall, through the self-play framework. Leveraging predictive online mirror descent, we introduce Predictive Treeplex Blackwell$+$ (PTB$+$), and show a $O(1/\sqrt{T})$ convergence rate to Nash equilibrium in self-play. We then show how to stabilize PTB$+$ with a stepsize, resulting in an algorithm with a state-of-the-art $O(1/T)$ convergence rate. We provide an extensive set of experiments to compare our framework with several algorithmic benchmarks, including CFR$+$ and its predictive variant, and we highlight interesting connections between practical performance and the stepsize-dependence or stepsize-invariance properties of classical algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Blackwell approachability and no-regret learning are equivalent. In Proceedings of the 24th Annual Conference on Learning Theory, pages 27–46. JMLR Workshop and Conference Proceedings, 2011.
  2. David Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6(1):1–8, 1956.
  3. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
  4. Superhuman AI for multiplayer poker. Science, 365(6456):885–890, 2019.
  5. Dynamic thresholding and pruning for regret minimization. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  6. Revisiting CFR+ and alternating updates. Journal of Artificial Intelligence Research, 64:429–443, 2019.
  7. Block-coordinate methods and restarting for solving extensive-form games. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  8. Optimistic and adaptive lagrangian hedging. In Thirty-fifth AAAI conference on artificial intelligence, 2021.
  9. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
  10. Online convex optimization for sequential decision processes and extensive-form games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1917–1925, 2019a.
  11. Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In Advances in Neural Information Processing Systems, pages 5222–5232, 2019b.
  12. Better regularization for sequential decision spaces fast convergence rates for Nash, correlated, and team equilibria. In EC’21: Proceedings of the 22nd ACM Conference on Economics and Computation, 2021a.
  13. Faster game solving via predictive Blackwell approachability: Connecting regret matching and mirror descent. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 2021b.
  14. Near-optimal no-regret learning dynamics for general convex games. Advances in Neural Information Processing Systems, 35:39076–39089, 2022.
  15. Regret matching+: (in)stability and fast convergence in games. In Advances in Neural Information Processing Systems, 2023.
  16. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79–103, 1999.
  17. First-order algorithm with convergence for-equilibrium in two-person zero-sum games. Mathematical programming, 133(1-2):279–298, 2012.
  18. Geoffrey J Gordon. No-regret algorithms for online convex programs. In Advances in Neural Information Processing Systems, pages 489–496. Citeseer, 2007.
  19. Solving optimization problems with Blackwell approachability. Mathematics of Operations Research, 2023.
  20. Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research, 35(2):494–512, 2010.
  21. A modular analysis of adaptive (non-) convex optimization: Optimism, composite objectives, and variational bounds. In International Conference on Algorithmic Learning Theory, pages 681–720. PMLR, 2017.
  22. Adam: A method for stochastic optimization. In International Conference on Learning Representations, ICLR, 2015.
  23. Solving large sequential games with the excessive gap technique. In Advances in Neural Information Processing Systems, pages 864–874, 2018.
  24. Faster algorithms for extensive-form game solving via improved smoothing functions. Mathematical Programming, pages 1–33, 2020.
  25. Last-iterate convergence in extensive-form games. Advances in Neural Information Processing Systems, 34:14293–14305, 2021.
  26. Emanuel Milman. Approachable sets of vector payoffs in stochastic games. Games and Economic Behavior, 56(1):135–147, 2006.
  27. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
  28. Arkadi Nemirovski. Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
  29. Yurii Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization, 16(1):235–249, 2005.
  30. Online learning via offline greedy algorithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 737–738, 2021.
  31. Vianney Perchet. Approachability, Calibration and Regret in Games with Partial Observations. PhD thesis, PhD thesis, Université Pierre et Marie Curie, 2010.
  32. Online learning with predictable sequences. In Conference on Learning Theory, pages 993–1019. PMLR, 2013.
  33. On the convergence of Adam and beyond. International Conference on Learning Representations (ICLR), 2018.
  34. Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems, 28, 2015.
  35. Solving heads-up limit Texas hold’em. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
  36. Bernhard von Stengel. Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220–246, 1996.
  37. Regret minimization in games with incomplete information. In Advances in neural information processing systems, pages 1729–1736, 2007.

Summary

We haven't generated a summary for this paper yet.