Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games (2311.14869v1)

Published 24 Nov 2023 in cs.GT

Abstract: Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby leading to faster convergence to the set of coarse correlated equilibria (CCE). Yet, despite considerable recent progress, the fundamental complexity barriers for learning in normal- and extensive-form games are poorly understood. In this paper, we make a step towards closing this gap by first showing that -- barring major complexity breakthroughs -- any polynomial-time learning algorithms in extensive-form games need at least $2{\log{1/2 - o(1)} |\mathcal{T}|}$ iterations for the average regret to reach below even an absolute constant, where $|\mathcal{T}|$ is the number of nodes in the game. This establishes a superpolynomial separation between no-regret learning in normal- and extensive-form games, as in the former class a logarithmic number of iterations suffices to achieve constant average regret. Furthermore, our results imply that algorithms such as multiplicative weights update, as well as its \emph{optimistic} counterpart, require at least $2{(\log \log m){1/2 - o(1)}}$ iterations to attain an $O(1)$-CCE in $m$-action normal-form games. These are the first non-trivial -- and dimension-dependent -- lower bounds in that setting for the most well-studied algorithms in the literature. From a technical standpoint, we follow a beautiful connection recently made by Foster, Golowich, and Kakade (ICML '23) between sparse CCE and Nash equilibria in the context of Markov games. Consequently, our lower bounds rule out polynomial-time algorithms well beyond the traditional online learning framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (90)
  1. Perturbation techniques in online learning and optimization. Perturbations, Optimization, and Statistics, 233, 2016.
  2. R. Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1:67–96, 1974.
  3. Y. Babichenko. Query complexity of approximate nash equilibria. Journal of the ACM, 63(4):36:1–36:24, 2016.
  4. Can almost everybody be almost happy? In Proceedings of the Conference on Innovations in Theoretical Computer Science, pages 1–9. ACM, 2016.
  5. Efficient phi-regret minimization in extensive-form games via online mirror descent. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022a.
  6. Near-optimal learning of extensive-form games with imperfect information. In International Conference on Machine Learning (ICML), pages 1337–1382. PMLR, 2022b.
  7. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
  8. Sampling equilibria: Fast no-regret learning in structured games. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3817–3855. SIAM, 2023.
  9. Agnostic online learning. In Conference on Learning Theory (COLT), 2009.
  10. D. Blackwell. An analog of the minmax theorem for vector payoffs. Pacific Journal of Mathematics, 6:1–8, 1956.
  11. A. Blum and Y. Mansour. Learning, regret minimization, and equilibria. 2007.
  12. The myth of the folk theorem. Games and Economic Behavior, 70(1):34–43, 2010.
  13. Heads-up limit hold’em poker is solved. Science, 347(6218), January 2015.
  14. N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, pages 418–424, Dec. 2018.
  15. N. Brown and T. Sandholm. Solving imperfect-information games via discounted regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 2019a.
  16. N. Brown and T. Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885–890, 2019b.
  17. N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
  18. X. Chen and B. Peng. Hedging in games: Faster convergence of external and swap regrets. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
  19. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM, 2009.
  20. Multiplicative weight updates for extensive form games. In Autonomous Agents and Multi-Agent Systems, pages 1071–1078. ACM, 2023.
  21. F. Chu and J. Halpern. On the NP-completeness of finding an optimal strategy in games with common payoffs. International Journal of Game Theory, 2001.
  22. From external to swap regret 2.0: An efficient reduction and oblivious adversary for large action spaces, 2023.
  23. C. Daskalakis and N. Golowich. Fast rates for nonparametric online learning: from realizability to learning in games. In Proceedings of the Annual Symposium on Theory of Computing (STOC), pages 846–859. ACM, 2022.
  24. The complexity of computing a nash equilibrium. SIAM Journal on Computing, 39(1), 2009.
  25. Near-optimal no-regret algorithms for zero-sum games. Games and Economic Behavior, 92:327–348, 2015.
  26. Near-optimal no-regret learning in general games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 27604–27616, 2021.
  27. M. Dudík and G. J. Gordon. A sampling-based approach to computing equilibria in succinct extensive-form games. In UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009, pages 151–160. AUAI Press, 2009.
  28. Regret minimization and convergence to equilibria in general-sum markov games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 9343–9373. PMLR, 2023.
  29. Stable-predictive optimistic counterfactual regret minimization. In International Conference on Machine Learning (ICML), 2019a.
  30. Regret circuits: Composability of regret minimizers. In International Conference on Machine Learning, pages 1863–1872, 2019b.
  31. Coarse correlation in extensive-form games. In AAAI Conference on Artificial Intelligence (AAAI), volume 34, pages 1934–1941, 2020.
  32. Better regularization for sequential decision spaces: Fast convergence rates for nash, correlated, and team equilibria. In Proceedings of the ACM Conference on Economics and Computation (EC), page 432. ACM, 2021a.
  33. Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent. In AAAI Conference on Artificial Intelligence (AAAI), 2021b.
  34. Simple uncoupled no-regret learning dynamics for extensive-form correlated equilibrium. Journal of the ACM, 69(6):41:1–41:41, 2022a.
  35. Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games. In International Conference on Machine Learning (ICML), volume 162 of Proceedings of Machine Learning Research, pages 6337–6357. PMLR, 2022b.
  36. J. Fearnley and R. Savani. Finding approximate nash equilibria of bimatrix games via payoff queries. ACM Trans. Economics and Comput., 4(4):25:1–25:19, 2016.
  37. Learning equilibria of games via payoff queries. Journal of Machine Learning Research, 16:1305–1344, 2015.
  38. Adapting to game trees in zero-sum imperfect information games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 10093–10135. PMLR, 2023a.
  39. Local and adaptive mirror descents in extensive-form games, 2023b.
  40. D. Foster and R. Vohra. Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21:40–55, 1997.
  41. Learning in games: Robustness of fast convergence. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 4727–4735, 2016.
  42. Hardness of independent learning and sparse equilibrium computation in markov games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 10188–10221. PMLR, 2023.
  43. Lower bounds for the query complexity of equilibria in lipschitz games. Theor. Comput. Sci., 962:113931, 2023.
  44. No-regret learning in convex games. In Proceedings of the 25th international conference on Machine learning, pages 360–367. ACM, 2008.
  45. Towards characterizing the first-order query complexity of learning (approximate) nash equilibria in zero-sum matrix games, 2023.
  46. S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68:1127–1150, 2000.
  47. E. Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157–325, 2016.
  48. Fictitious self-play in extensive-form games. In International Conference on Machine Learning (ICML), volume 37 of JMLR Workshop and Conference Proceedings, pages 805–813. JMLR.org, 2015.
  49. W. Hoeffding and J. Wolfowitz. Distinguishability of sets of distributions. The Annals of Mathematical Statistics, 29(3):700–718, 1958.
  50. Adaptive learning in continuous games: Optimal regret bounds and convergence to nash equilibrium. In Conference on Learning Theory (COLT), volume 134 of Proceedings of Machine Learning Research, pages 2388–2422. PMLR, 2021.
  51. No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
  52. W. Huang and B. von Stengel. Computing an extensive-form correlated equilibrium in polynomial time. In Internet and Network Economics, 4th International Workshop, WINE 2008, volume 5385 of Lecture Notes in Computer Science, pages 506–513. Springer, 2008.
  53. A. X. Jiang and K. Leyton-Brown. Polynomial-time computation of exact correlated equilibrium in compact games. Games and Economic Behavior, 91:347–359, 2015.
  54. A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71:291–307, 2005.
  55. Let’s be honest: An optimal no-regret framework for zero-sum games. In International Conference on Machine Learning (ICML), volume 80 of Proceedings of Machine Learning Research, pages 2493–2501. PMLR, 2018.
  56. Fast algorithms for finding randomized strategies in game trees. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 1994.
  57. Learning in two-player zero-sum partially observable markov games with perfect recall. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 11987–11998, 2021.
  58. Playing large games using simple strategies. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 36–41, San Diego, CA, 2003. ACM.
  59. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2:285–318, 1988.
  60. M. Littman and P. Stone. A polynomial-time Nash equilibrium algorithm for repeated games. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 48–54, San Diego, CA, 2003.
  61. Query-efficient algorithms to find the unique nash equilibrium in a two-player zero-sum matrix game, 2023.
  62. Efficient deviation types and learning for hindsight rationality in extensive-form games. In M. Meila and T. Zhang, editors, International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 7818–7828. PMLR, 2021a.
  63. Hindsight and sequential rationality of correlated play. In AAAI Conference on Artificial Intelligence (AAAI), pages 5584–5594. AAAI Press, 2021b.
  64. H. Moulin and J.-P. Vial. Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 7(3-4):201–221, 1978.
  65. C. H. Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and system Sciences, 48(3):498–532, 1994.
  66. C. H. Papadimitriou and T. Roughgarden. Computing correlated equilibria in multi-player games. Journal of the ACM, 55(3):14:1–14:29, 2008.
  67. B. Peng and A. Rubinstein. Fast swap regret minimization and applications to approximate correlated equilibria, 2023.
  68. Beyond time-average convergence: Near-optimal uncoupled online learning via clairvoyant multiplicative weights update. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
  69. Pure monte carlo counterfactual regret minimization, 2023.
  70. A. Rakhlin and K. Sridharan. Online learning with predictable sequences. In Conference on Learning Theory, pages 993–1019, 2013a.
  71. A. Rakhlin and K. Sridharan. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems, pages 3066–3074, 2013b.
  72. J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54:296–301, 1951.
  73. I. Romanovskii. Reduction of a game with complete memory to a matrix game. Soviet Mathematics, 3, 1962.
  74. A. Rubinstein. Inapproximability of nash equilibrium. SIAM Journal on Computing, 47(3):917–959, 2018.
  75. Y. Shoham and K. Leyton-Brown. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.
  76. Sample-efficient learning of correlated equilibria in extensive-form games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
  77. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems, pages 2989–2997, 2015.
  78. E. Takimoto and M. K. Warmuth. Path kernels and multiplicative updates. Journal of Machine Learning Research, 4:773–818, 2003.
  79. Solving heads-up limit Texas hold’em. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), 2015.
  80. Regret-minimizing double oracle for extensive-form games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 33599–33615. PMLR, 2023.
  81. The computational complexity of single-player imperfect-recall games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2878–2887, 2023.
  82. B. von Stengel. Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220–246, 1996.
  83. B. von Stengel and F. Forges. Extensive-form correlated equilibrium: Definition and computational complexity. Mathematics of Operations Research, 33(4):1002–1022, 2008.
  84. V. G. Vovk. Aggregating strategies. In Conference on Learning Theory (COLT), pages 371–386. Morgan Kaufmann, 1990.
  85. Alternating mirror descent for constrained min-max games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
  86. Y. Yang and C. Ma. O⁢(T−1)𝑂superscript𝑇1{O}({T}^{-1})italic_O ( italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) convergence of optimistic-follow-the-regularized-leader in two-player zero-sum markov games. In The Eleventh International Conference on Learning Representations, ICLR 2023. OpenReview.net, 2023.
  87. B. H. Zhang and T. Sandholm. Finding and certifying (near-)optimal strategies in black-box extensive-form games. In AAAI Conference on Artificial Intelligence (AAAI), pages 5779–5788. AAAI Press, 2021.
  88. B. H. Zhang and T. Sandholm. Team correlated equilibria in zero-sum extensive-form games via tree decompositions. In AAAI Conference on Artificial Intelligence (AAAI), pages 5252–5259. AAAI Press, 2022.
  89. Policy optimization for markov games: Unified framework and faster convergence. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
  90. Regret minimization in games with incomplete information. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2007.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ioannis Anagnostides (34 papers)
  2. Alkis Kalavasis (28 papers)
  3. Tuomas Sandholm (119 papers)
  4. Manolis Zampetakis (45 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.