$\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games (2403.07890v2)
Abstract: No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.
- Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In ACM SIGACT Symposium on Theory of Computing, pages 736–749, 2022a.
- Uncoupled learning dynamics with O(logT)𝑂𝑇{O}(\log{T})italic_O ( roman_log italic_T ) swap regret in multiplayer games. Advances in Neural Information Processing Systems, 35:3292–3304, 2022b.
- Yu Bai and Chi Jin. Provable self-play algorithms for competitive reinforcement learning. In International Conference on Machine Learning, pages 551–560, 2020.
- Near-optimal reinforcement learning with self-play. Advances in Neural Information Processing Systems, 33, 2020.
- From external to internal regret. Journal of Machine Learning Research, 8(6), 2007.
- Uncoupled and convergent learning in two-player zero-sum Markov games with bandit feedback. In Conference on Neural Information Processing Systems, 2023.
- Near-optimal policy optimization for correlated equilibrium in general-sum Markov games. arXiv preprint arXiv:2401.15240, 2024.
- Fast policy extragradient methods for competitive games with entropy regularization. Advances in Neural Information Processing Systems, 34:27952–27964, 2021.
- Faster last-iterate convergence of policy optimization in zero-sum Markov games. In International Conference on Learning Representations, 2022.
- Prediction, Learning, and Games. Cambridge University Press, 2006.
- Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems, 33:18990–18999, 2020.
- The complexity of computing a Nash equilibrium. Communications of the ACM, 52(2):89–97, 2009.
- Near-optimal no-regret algorithms for zero-sum games. In ACM-SIAM Symposium on Discrete Algorithms, pages 235–254, 2011.
- Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems, 34:27604–27616, 2021.
- Regret minimization and convergence to equilibria in general-sum Markov games. In International Conference on Machine Learning, pages 9343–9373, 2023.
- Learning in games: Robustness of fast convergence. Advances in Neural Information Processing Systems, 29, 2016.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
- A general class of no-regret learning algorithms and game-theoretic equilibria. In Conference on Learning Theory, pages 2–12, 2003.
- A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000.
- Wassily Hoeffding and J Wolfowitz. Distinguishability of sets of distributions. The Annals of Mathematical Statistics, 29(3):700–718, 1958.
- Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. In Conference on Learning Theory, pages 2388–2422, 2021.
- Is Q-learning provably efficient? Advances in Neural Information Processing Systems, 31, 2018.
- V-learning–A simple, efficient, decentralized algorithm for multiagent RL. In ICLR Workshop on Gamification and Multiagent Solutions, 2022.
- Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
- A sharp analysis of model-based reinforcement learning with self-play. In International Conference on Machine Learning, 2021.
- Provably efficient reinforcement learning in decentralized general-sum Markov games. Dynamic Games and Applications, 13:165–186, 2023.
- On improving model-free algorithms for decentralized multi-agent reinforcement learning. In International Conference on Machine Learning, pages 15007–15049, 2022.
- Problem Complexity and Method Efficiency in Optimization. 1983.
- Optimization, learning, and games with predictable sequences. Advances in Neural Information Processing Systems, 26, 2013.
- Julia Robinson. An iterative method of solving a game. Annals of Mathematics, pages 296–301, 1951.
- Lloyd S Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, 1953.
- When can we learn general-sum Markov games with a large number of players sample-efficiently? In International Conference on Learning Representations, 2022.
- Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems, 28, 2015.
- Breaking the curse of multiagency: Provably efficient decentralized multi-agent RL with function approximation. arXiv preprint arXiv:2302.06606, 2023.
- Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive Markov games. In Conference on Learning Theory, pages 4259–4299, 2021.
- O(T−1)𝑂superscript𝑇1{O}({T}^{-1})italic_O ( italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) convergence of optimistic-follow-the-regularized-leader in two-player zero-sum Markov games. In International Conference on Learning Representations, 2023.
- Policy optimization for Markov games: Unified framework and faster convergence. In Advances in Neural Information Processing Systems, 2022.
- Almost optimal model-free reinforcement learningvia reference-advantage decomposition. Advances in Neural Information Processing Systems, 33:15198–15207, 2020.
- Provably efficient policy optimization for two-player zero-sum Markov games. In International Conference on Artificial Intelligence and Statistics, pages 2736–2761, 2022.
- Weichao Mao (11 papers)
- Haoran Qiu (10 papers)
- Chen Wang (600 papers)
- Hubertus Franke (15 papers)
- Zbigniew Kalbarczyk (19 papers)
- Tamer Başar (200 papers)