Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

$\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games (2403.07890v2)

Published 2 Feb 2024 in cs.GT, cs.AI, and cs.LG

Abstract: No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In ACM SIGACT Symposium on Theory of Computing, pages 736–749, 2022a.
  2. Uncoupled learning dynamics with O⁢(log⁡T)𝑂𝑇{O}(\log{T})italic_O ( roman_log italic_T ) swap regret in multiplayer games. Advances in Neural Information Processing Systems, 35:3292–3304, 2022b.
  3. Yu Bai and Chi Jin. Provable self-play algorithms for competitive reinforcement learning. In International Conference on Machine Learning, pages 551–560, 2020.
  4. Near-optimal reinforcement learning with self-play. Advances in Neural Information Processing Systems, 33, 2020.
  5. From external to internal regret. Journal of Machine Learning Research, 8(6), 2007.
  6. Uncoupled and convergent learning in two-player zero-sum Markov games with bandit feedback. In Conference on Neural Information Processing Systems, 2023.
  7. Near-optimal policy optimization for correlated equilibrium in general-sum Markov games. arXiv preprint arXiv:2401.15240, 2024.
  8. Fast policy extragradient methods for competitive games with entropy regularization. Advances in Neural Information Processing Systems, 34:27952–27964, 2021.
  9. Faster last-iterate convergence of policy optimization in zero-sum Markov games. In International Conference on Learning Representations, 2022.
  10. Prediction, Learning, and Games. Cambridge University Press, 2006.
  11. Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems, 33:18990–18999, 2020.
  12. The complexity of computing a Nash equilibrium. Communications of the ACM, 52(2):89–97, 2009.
  13. Near-optimal no-regret algorithms for zero-sum games. In ACM-SIAM Symposium on Discrete Algorithms, pages 235–254, 2011.
  14. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems, 34:27604–27616, 2021.
  15. Regret minimization and convergence to equilibria in general-sum Markov games. In International Conference on Machine Learning, pages 9343–9373, 2023.
  16. Learning in games: Robustness of fast convergence. Advances in Neural Information Processing Systems, 29, 2016.
  17. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
  18. A general class of no-regret learning algorithms and game-theoretic equilibria. In Conference on Learning Theory, pages 2–12, 2003.
  19. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000.
  20. Wassily Hoeffding and J Wolfowitz. Distinguishability of sets of distributions. The Annals of Mathematical Statistics, 29(3):700–718, 1958.
  21. Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. In Conference on Learning Theory, pages 2388–2422, 2021.
  22. Is Q-learning provably efficient? Advances in Neural Information Processing Systems, 31, 2018.
  23. V-learning–A simple, efficient, decentralized algorithm for multiagent RL. In ICLR Workshop on Gamification and Multiagent Solutions, 2022.
  24. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
  25. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
  26. A sharp analysis of model-based reinforcement learning with self-play. In International Conference on Machine Learning, 2021.
  27. Provably efficient reinforcement learning in decentralized general-sum Markov games. Dynamic Games and Applications, 13:165–186, 2023.
  28. On improving model-free algorithms for decentralized multi-agent reinforcement learning. In International Conference on Machine Learning, pages 15007–15049, 2022.
  29. Problem Complexity and Method Efficiency in Optimization. 1983.
  30. Optimization, learning, and games with predictable sequences. Advances in Neural Information Processing Systems, 26, 2013.
  31. Julia Robinson. An iterative method of solving a game. Annals of Mathematics, pages 296–301, 1951.
  32. Lloyd S Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, 1953.
  33. When can we learn general-sum Markov games with a large number of players sample-efficiently? In International Conference on Learning Representations, 2022.
  34. Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems, 28, 2015.
  35. Breaking the curse of multiagency: Provably efficient decentralized multi-agent RL with function approximation. arXiv preprint arXiv:2302.06606, 2023.
  36. Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive Markov games. In Conference on Learning Theory, pages 4259–4299, 2021.
  37. O⁢(T−1)𝑂superscript𝑇1{O}({T}^{-1})italic_O ( italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) convergence of optimistic-follow-the-regularized-leader in two-player zero-sum Markov games. In International Conference on Learning Representations, 2023.
  38. Policy optimization for Markov games: Unified framework and faster convergence. In Advances in Neural Information Processing Systems, 2022.
  39. Almost optimal model-free reinforcement learningvia reference-advantage decomposition. Advances in Neural Information Processing Systems, 33:15198–15207, 2020.
  40. Provably efficient policy optimization for two-player zero-sum Markov games. In International Conference on Artificial Intelligence and Statistics, pages 2736–2761, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Weichao Mao (11 papers)
  2. Haoran Qiu (10 papers)
  3. Chen Wang (600 papers)
  4. Hubertus Franke (15 papers)
  5. Zbigniew Kalbarczyk (19 papers)
  6. Tamer Başar (200 papers)

Summary

We haven't generated a summary for this paper yet.