Provably Learning Nash Policies in Constrained Markov Potential Games
Abstract: Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world instances, the agents may not only want to optimize their objectives, but also ensure safe behavior. For example, in traffic routing, each car (agent) aims to reach its destination quickly (objective) while avoiding collisions (safety). Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable. In this work, we introduce and study Constrained Markov Potential Games (CMPGs), an important class of CMGs. We first show that a Nash policy for CMPGs can be found via constrained optimization. One tempting approach is to solve it by Lagrangian-based primal-dual methods. As we show, in contrast to the single-agent setting, however, CMPGs do not satisfy strong duality, rendering such approaches inapplicable and potentially unsafe. To solve the CMPG problem, we propose our algorithm Coordinate-Ascent for CMPGs (CA-CMPG), which provably converges to a Nash policy in tabular, finite-horizon CMPGs. Furthermore, we provide the first sample complexity bounds for learning Nash policies in unknown CMPGs, and, which under additional assumptions, guarantee safe exploration.
- Eitan Altman. Constrained Markov decision processes, volume 7. CRC Press, 1999.
- Constrained markov games: Nash equilibria. In Advances in dynamic games and applications, pages 213–221. Springer, 2000.
- A survey on networking games in telecommunications. Computers & Operations Research, 33(2):286–311, 2006.
- Robert J Aumann. Correlated equilibrium as an expression of bayesian rationality. Econometrica: Journal of the Econometric Society, pages 1–18, 1987.
- Dimitri P Bertsekas. Dynamic programming and optimal control, volume 1. Athena scientific Belmont, MA, 1995.
- Vivek S. Borkar. A convex analytic approach to markov decision processes. Probability Theory and Related Fields, 1988.
- Sequential stackelberg equilibria in two-person games. Journal of Optimization Theory and Applications, 59(1):71–97, 1988.
- DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning. October 2022. URL https://openreview.net/forum?id=U4BUMoVTrB2.
- Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games. June 2022. URL https://openreview.net/forum?id=pe2ZGTUxVvJ.
- Actor-critic algorithms for constrained multi-agent reinforcement learning. CoRR, abs/1905.02907, 2019. URL http://arxiv.org/abs/1905.02907.
- Natural policy gradient primal-dual method for constrained markov decision processes. In NeurIPS, 2020.
- Independent policy gradient for large-scale Markov potential games: Sharper rates, function approximation, and game-agnostic convergence. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 5166–5220. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/ding22b.html.
- Exploration-exploitation in constrained mdps. arXiv preprint arXiv:2003.02189, 2020.
- Generalized nash equilibrium problems. Annals of Operations Research, 175(1):177–211, 2010.
- Independent natural policy gradient always converges in markov potential games. In International Conference on Artificial Intelligence and Statistics, pages 4414–4425. PMLR, 2022.
- A Comprehensive Survey on Safe Reinforcement Learning. Journal of Machine Learning Research, 16(42):1437–1480, 2015. ISSN 1533-7928. URL http://jmlr.org/papers/v16/garcia15a.html.
- Is q-learning provably efficient? arXiv preprint arXiv:1807.03765, 2018.
- Global convergence of multi-agent policy gradient in markov potential games. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=gfwON7rAm4.
- Cmix: Deep multi-agent reinforcement learning with peak and average constraints. In Proceedings of the 2021 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021), Virtual Conference, pages 13–17, 2021a.
- Learning policies with zero or bounded constraint violation for constrained mdps. arXiv preprint arXiv:2106.02684, 2021b.
- Independent and Decentralized Learning in Markov Potential Games. 2022. URL http://arxiv.org/abs/2205.14590.
- On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning. In Proceedings of the 39th International Conference on Machine Learning, pages 15007–15049. PMLR, June 2022. URL https://proceedings.mlr.press/v162/mao22a.html. ISSN: 2640-3498.
- Potential games. Games and economic behavior, 14(1):124–143, 1996.
- John F Nash et al. Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1):48–49, 1950.
- Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’21, page 1616–1618, Richland, SC, 2021. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450383073.
- Constrained reinforcement learning has zero duality gap. arXiv preprint arXiv:1910.13393, 2019.
- When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently? 2021. URL http://arxiv.org/abs/2110.04184.
- Near-Optimal Sample Complexity Bounds for Constrained MDPs. October 2022. URL https://openreview.net/forum?id=ZJ7Lrtd12x_.
- Koji Yamamoto. A comprehensive survey of potential game approaches to wireless networks. IEICE Transactions on Communications, E98.B(9):1804–1823, 2015. doi: 10.1587/transcom.E98.B.1804.
- An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective, March 2021. URL http://arxiv.org/abs/2011.00583. arXiv:2011.00583 [cs].
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, pages 321–384, 2021a.
- Gradient play in stochastic games: stationary points, convergence, and sample complexity, December 2021b. URL http://arxiv.org/abs/2106.00198. arXiv:2106.00198 [cs, math].
- On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games. October 2022. URL https://openreview.net/forum?id=X1oVDZIABwF.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.