Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games (2401.15240v2)

Published 26 Jan 2024 in cs.LG, cs.GT, and math.OC

Abstract: We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC).
  2. Uncoupled learning dynamics with o⁢(log⁡t)𝑜𝑡o(\log t)italic_o ( roman_log italic_t ) swap regret in multiplayer games. In Advances in Neural Information Processing Systems (NeurIPS).
  3. Near-optimal reinforcement learning with self-play. Advances in neural information processing systems, 33:2159–2170.
  4. The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216.
  5. From external to internal regret. Journal of Machine Learning Research, 8(6).
  6. Rational and convergent learning in stochastic games. In International joint conference on artificial intelligence, volume 17, pages 1021–1026. Citeseer.
  7. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems (NeurIPS), 33:18990–18999.
  8. Breaking the curse of multiagents in a large state space: Rl in markov games with independent linear function approximation. In The Thirty Sixth Annual Conference on Learning Theory, pages 2651–2652. PMLR.
  9. Near-optimal no-regret algorithms for zero-sum games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pages 235–254. SIAM.
  10. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems (NeurIPS).
  11. Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence. In International Conference on Machine Learning, pages 5166–5220. PMLR.
  12. Regret minimization and convergence to equilibria in general-sum markov games. In International Conference on Machine Learning, pages 9343–9373. PMLR.
  13. Hardness of independent learning and sparse equilibrium computation in Markov games. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 10188–10221. PMLR.
  14. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79–103.
  15. Is q-learning provably efficient? Advances in neural information processing systems, 31.
  16. V-learning–a simple, efficient, decentralized algorithm for multiagent rl. arXiv preprint arXiv:2110.14555.
  17. Decentralized cooperative reinforcement learning with hierarchical information structure. In International Conference on Algorithmic Learning Theory, pages 573–605. PMLR.
  18. Global convergence of multi-agent policy gradient in markov potential games. arXiv preprint arXiv:2106.01969.
  19. Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier.
  20. Littman, M. L. et al. (2001). Friend-or-foe q-learning in general-sum games. In ICML, volume 1, pages 322–328.
  21. Provably efficient reinforcement learning in decentralized general-sum markov games. Dynamic Games and Applications, 13(1):165–186.
  22. Nemirovski, A. (2004). Interior point polynomial time methods in convex programming. Lecture notes, 42(16):3215–3224.
  23. Nesterov, Y. (2003). Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media.
  24. Optimization, learning, and games with predictable sequences. Advances in Neural Information Processing Systems, 26.
  25. Decentralized q-learning in zero-sum markov games. Advances in Neural Information Processing Systems, 34:18320–18334.
  26. Shapley, L. S. (1953). Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100.
  27. Mastering the game of go without human knowledge. nature, 550(7676):354–359.
  28. When can we learn general-sum markov games with a large number of players sample-efficiently? arXiv preprint arXiv:2110.04184.
  29. Learning correlated equilibria in games with compact sets of strategies. Games and Economic Behavior, 59(1):187–208.
  30. Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems (NeurIPS).
  31. Online learning in unknown markov games. In International conference on machine learning, pages 10279–10288. PMLR.
  32. Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog, 2:20.
  33. Breaking the curse of multiagency: Provably efficient decentralized multi-agent rl with function approximation. arXiv preprint arXiv:2302.06606.
  34. Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive markov games. In Conference on learning theory, pages 4259–4299. PMLR.
  35. $o(t^{-1})$ convergence of optimistic-follow-the-regularized-leader in two-player zero-sum markov games. In The Eleventh International Conference on Learning Representations.
  36. Policy optimization for markov games: Unified framework and faster convergence. Advances in Neural Information Processing Systems, 35:21886–21899.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yang Cai (64 papers)
  2. Haipeng Luo (99 papers)
  3. Chen-Yu Wei (46 papers)
  4. Weiqiang Zheng (16 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.