Beyond Theorems: A Counterexample to Potential Markov Game Criteria (2405.08206v1)
Abstract: There are only limited classes of multi-player stochastic games in which independent learning is guaranteed to converge to a Nash equilibrium. Markov potential games are a key example of such classes. Prior work has outlined sets of sufficient conditions for a stochastic game to qualify as a Markov potential game. However, these conditions often impose strict limitations on the game's structure and tend to be challenging to verify. To address these limitations, Mguni et al. [12] introduce a relaxed notion of Markov potential games and offer an alternative set of necessary conditions for categorizing stochastic games as potential games. Under these conditions, the authors claim that a deterministic Nash equilibrium can be computed efficiently by solving a dual Markov decision process. In this paper, we offer evidence refuting this claim by presenting a counterexample.
- Lawrence E Blume. 1995. The statistical mechanics of best-response strategy revision. Games and Economic Behavior 11, 2 (1995), 111–145.
- Vivek S Borkar. 2002. Reinforcement learning in Markovian evolutionary games. Advances in Complex Systems 5, 01 (2002), 55–72.
- The complexity of Markov equilibrium in stochastic games. In The 36th Annual Conference on Learning Theory. 4180–4234.
- Arlington M Fink. 1964. Equilibrium in a stochastic n-person game. Journal of Science of the Hiroshima University, series ai (mathematics) 28, 1 (1964), 89–93.
- Learning with Opponent-Learning Awareness. (2018), 122–130.
- Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML). 1146–1155.
- Independent natural policy gradient always converges in Markov potential games. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 4414–4425.
- Decentralized single-timescale actor-critic on zero-sum two-player stochastic games. In Proceedings of the International Conference on Machine Learning (ICML). 3899–3909.
- Junling Hu and Michael P Wellman. 2003. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4 (2003), 1039–1069.
- Global convergence of multi-agent policy gradient in Markov potential games. arXiv preprint arXiv:2106.01969 (2021).
- Learning parametric closed-loop policies for Markov potential games. arXiv preprint arXiv:1802.00899 (2018).
- Learning in nonzero-sum stochastic games with potentials. In Proceedings of the International Conference on Machine Learning (ICML). 7688–7699.
- Dov Monderer and Lloyd S Shapley. 1996. Potential games. Games and Economic Behavior 14, 1 (1996), 124–143.
- Learning Nash equilibrium for general-sum Markov games from batch data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 232–241.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8 (1992), 279–292.
- Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. (1989).