A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning (2405.00243v1)
Abstract: Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods.We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.
- Deep reinforcement learning at the edge of the statistical precipice. In 35th Int’l Conf. on Neural Information Processing Systems, pages 29304–29320, 2021.
- Robert J. Aumann. Mixed and behavior strategies in infinite extensive game. In M. Dresher, L. S. Shapley, and A. W. Tucker, editors, Advances in Game Theory, Annals of Mathematics Studies, pages 627–650. Princeton University Press, 1964.
- Re-evaluating evaluation. In 32nd Int’l Conf. on Neural Information Processing Systems, 2018.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Leo Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.
- Combining deep reinforcement learning and search for imperfect-information games. In 34th Int’l Conf. on Neural Information Processing Systems, pages 17057–17069, 2020.
- Pure exploration in multi-armed bandits problems. In 20th Int’l Conf. on Algorithmic Learning Theory, pages 23–37, 2009.
- Notes on equilibria in symmetric games. In AAMAS-04 Workshop on Game-Theoretic and Decision-Theoretic Agents, 2004.
- Information set Monte Carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 4:120–143, 2012.
- Policy improvement by planning with Gumbel. In 10th Int’l Conf. on Learning Representations, 2022.
- Bootstrap Methods and Their Application. Cambridge University Press, 1997.
- IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In 35th Int’l Conf. on Machine Learning, pages 1407–1416, 2018.
- Counterfactual multi-agent policy gradients. In 32nd AAAI Conf. on Artificial Intelligence, 2018.
- Towards a standardised performance evaluation protocol for cooperative MARL. In 36th Int’l Conf. on Neural Information Processing Systems, pages 5510–5521, 2022.
- Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review, 55:895–943, 2022.
- Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
- Andreas Hefti. Equilibria in symmetric games: Theory and applications. Theoretical Economics, 12(3):979–1002, 2017.
- Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121, 2016.
- Deep reinforcement learning that matters. In 32nd AAAI Conf. on Artificial Intelligence, 2018.
- Neural replicator dynamics: Multiagent learning via hedging policy gradients. In 19th Int’l Conf. on Autonomous Agents and Multi-Agent Systems, pages 492–501, 2020.
- “Other-play” for zero-shot coordination. In 37th Int’l Conf. on Machine Learning, pages 4399–4410, 2020.
- Learned belief search: Efficiently improving policies in partially observable settings. arXiv preprint arXiv:2106.09086, 2021.
- A review of the Gumbel-max trick and its extensions for discrete stochasticity in machine learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1353–1371, 2022.
- Empirical game-theoretic analysis of the TAC supply chain game. In 6th Int’l Conf. on Autonomous Agents and Multi-Agent Systems, pages 1188–1195, 2007.
- Evaluating the performance of reinforcement learning algorithms. In 37th Int’l Conf. on Machine Learning, pages 4962–4973, 2020.
- Selecting strategies using empirical game models: An experimental analysis of meta-strategies. In 7th Int’l Conf. on Autonomous Agents and Multi-Agent Systems, pages 1095–1101, 2008.
- David M. Kreps. Game Theory and Economic Modelling. Oxford University Press, 1990.
- A unified game-theoretic approach to multiagent reinforcement learning. In 31st Int’l Conf. on Neural Information Processing Systems, pages 4190–4203, 2017.
- Openspiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453, 2019.
- Scalable evaluation of multi-agent reinforcement learning with Melting Pot. In 38th Int’l Conf. on Machine Learning, pages 6187–6199, 2021.
- Deal or no deal? End-to-end learning for negotiation dialogues. In Conference on Empirical Methods in Natural Language Processing, 2017.
- Combining tree-search, generative models, and Nash bargaining concepts in game-theoretic reinforcement learning. arXiv preprint arXiv:2302.00797, 2023.
- Multi-agent actor-critic for mixed cooperative-competitive environments. In 31st Int’l Conf. on Neural Information Processing Systems, 2017.
- Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research, 61:523–562, 2018.
- Asynchronous methods for deep reinforcement learning. In 33rd Int’l Conf. on Machine Learning, pages 1928–1937, 2016.
- John Nash. Non-cooperative games. Annals of Mathematics, pages 286–295, 1951.
- Comparative evaluation of cooperative multi-agent deep reinforcement learning algorithms. arXiv preprint arXiv:2006.07869, 2020.
- Minimizing simple and cumulative regret in monte-carlo tree search. In ECAI-14 Workshop on Computer Games, pages 1–15, 2014.
- From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization. In 38th Int’l Conf. on Machine Learning, pages 8525–8535, 2021.
- Mastering the game of Stratego with model-free multiagent reinforcement learning. Science, 378(6623):990–996, 2022.
- Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research, 21(178):7234–7284, 2020.
- Mixed-integer programming methods for finding Nash equilibria. In 19th AAAI Conf. on Artificial Intelligence, pages 495–501, 2005.
- Student of games: A unified learning algorithm for both perfect and imperfect information games. Science Advances, 2023.
- High-dimensional continuous control using generalized advantage estimation. In Fourth International Conference on Learning Representations, 2016.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Monte-Carlo planning in large POMDPs. In 24th Int’l Conf. on Neural Information Processing Systems, volume 23, 2010.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140–1144, 2018.
- The update equivalence framework for decision-time planning. In Thirteenth International Conference on Learning Representation, 2024.
- Collaborating with humans without human data. In 35th Int’l Conf. on Neural Information Processing Systems, pages 14502–14515, 2021.
- A new formalism, method and open issues for zero-shot coordination. In 38th Int’l Conf. on Machine Learning, pages 10413–10423, 2021.
- Bounds and dynamics for empirical game-theoretic analysis. Autonomous Agents and Multi-Agent Systems, 34(7), 2020.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Michael P. Wellman. Putting the agent in agent-based modeling. Autonomous Agents and Multi-Agent Systems, 30:1175–1189, 2016.
- Bootstrap statistics for empirical games. In 13th Int’l Conf. on Autonomous Agents and Multi-Agent Systems, pages 597–604, 2014.
- Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature, 602(7896):223–228, 2022.
- The surprising effectiveness of PPO in cooperative multi-agent games. In 36th Int’l Conf. on Neural Information Processing Systems, pages 24611–24624, 2022.