Multi-Agent Diagnostics for Robustness via Illuminated Diversity (2401.13460v3)
Abstract: In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.
- Anthropic. 2023. Introducing Claude. https://www.anthropic.com/index/introducing-claude Accessed on Oct 6, 2023.
- Self-Driving Cars: A Survey. arXiv:1901.04407 [cs.RO]
- Open-ended learning in symmetric zero-sum games. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 434–443. https://proceedings.mlr.press/v97/balduzzi19a.html
- Dota 2 with Large Scale Deep Reinforcement Learning. CoRR abs/1912.06680 (2019). arXiv:1912.06680
- On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705 (2019).
- Rémi Coulom. 2007. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Computers and Games, H. Jaap van den Herik, Paolo Ciancarini, and H. H. L. M. (Jeroen) Donkers (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 72–83.
- Robots that can adapt like animals. Nature 521 (2015), 503–507.
- Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation 22, 2 (2018), 245–259. https://doi.org/10.1109/TEVC.2017.2704781
- Real world games look like spinning tops. Advances in Neural Information Processing Systems 33 (2020), 17443–17454.
- Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? arXiv:2011.09533 [cs.AI]
- Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design. In Advances in Neural Information Processing Systems, Vol. 33.
- First return, then explore. Nature 590 (2020), 580 – 586. https://api.semanticscholar.org/CorpusID:216552951
- SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=5OjLGiJW3u
- Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems. 2137–2145.
- Counterfactual Multi-Agent Policy Gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (New Orleans, Louisiana, USA) (AAAI’18/IAAI’18/EAAI’18). AAAI Press, Article 363, 9 pages.
- Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (Cancún, Mexico) (GECCO ’20). Association for Computing Machinery, New York, NY, USA, 94–102. https://doi.org/10.1145/3377930.3390232
- Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity. https://arxiv.org/abs/2110.04041
- Nikolaus Hansen and Andreas Ostermeier. 2001. Completely Derandomized Self-Adaptation in Evolution Strategies. Evol. Comput. 9, 2 (jun 2001), 159–195. https://doi.org/10.1162/106365601750190398
- Sim2Real in robotics and automation: Applications and challenges. IEEE transactions on automation science and engineering 18, 2 (2021), 398–400.
- TiKick: towards playing multi-agent football full games from single-agent demonstrations. arXiv preprint arXiv:2110.04507 (2021).
- Mix-ME: Quality-Diversity for Multi-Agent Learning. arXiv preprint arXiv:2311.01829 (2023).
- Replay-Guided Adversarial Environment Design. In Advances in Neural Information Processing Systems.
- Robocup: The robot world cup initiative. In Proceedings of the first international conference on Autonomous agents. 340–347.
- RoboCup: A challenge problem for AI. AI magazine 18, 1 (1997), 73–73.
- Google Research Football: A Novel Reinforcement Learning Environment. arXiv:1907.11180 [cs.LG]
- A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. https://arxiv.org/abs/1711.00832
- Joel Lehman and Kenneth O Stanley. 2011. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation 19, 2 (2011), 189–223.
- Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 3991–4002.
- TiZero: Mastering Multi-Agent Football with Curriculum Learning and Self-Play. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (London, United Kingdom) (AAMAS ’23). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 67–76.
- Maven: Multi-agent variational exploration. Advances in Neural Information Processing Systems 32 (2019).
- Stabilizing Unsupervised Environment Design with a Learned Adversary. In Proceedings of The 2nd Conference on Lifelong Learning Agents (Proceedings of Machine Learning Research, Vol. 232). PMLR, 270–291.
- Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv:1504.04909 [cs.AI]
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- OpenRL-Lab. 2023. TiZero. https://github.com/OpenRL-Lab/TiZero. GitHub repository.
- Evolving Curricula with Regret-Based Environment Design. https://arxiv.org/abs/2203.01302
- Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning. PMLR, 4295–4304.
- Adversarial attacks and defenses in deep learning. Engineering 6, 3 (2020), 346–360.
- Reinforcement learning for robot soccer. Autonomous Robots 27 (2009), 55–73.
- MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=sKWlRDzPfd7
- The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2186–2188.
- Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (dec 2020), 604–609.
- Proximal Policy Optimization Algorithms. ArXiv abs/1707.06347 (2017).
- L. S. Shapley. 1953. Stochastic Games. Proceedings of the National Academy of Sciences 39, 10 (1953), 1095–1100. https://doi.org/10.1073/pnas.39.10.1095 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.39.10.1095
- Mastering the game of Go with deep neural networks and tree search. Nature 529 (2016), 484–489.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
- Approximate Exploitability: Learning a Best Response. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3487–3493. https://doi.org/10.24963/ijcai.2022/484 Main Track.
- Pyribs: A Bare-Bones Python Library for Quality Diversity Optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (Lisbon, Portugal) (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 220–229. https://doi.org/10.1145/3583131.3590374
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
- Game Plan: What AI Can Do for Football, and What Football Can Do for AI. J. Artif. Int. Res. 71 (sep 2021), 41–88. https://doi.org/10.1613/jair.1.12505
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nat. 575, 7782 (2019), 350–354. https://doi.org/10.1038/s41586-019-1724-z
- Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2302.03429 (2023).
- Adversarial Policies Beat Superhuman Go AIs. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 35655–35739. https://proceedings.mlr.press/v202/wang23g.html
- Multi-agent reinforcement learning is a sequence modeling problem. Advances in Neural Information Processing Systems 35 (2022), 16509–16521.
- David J Wu. 2019. Accelerating self-play learning in go. arXiv preprint arXiv:1902.10565 (2019).
- Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 7896 (Feb. 2022), 223–228.
- The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=YVXaxB6L2Pl
- Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey. 2020 IEEE Symposium Series on Computational Intelligence (SSCI) (2020), 737–744. https://api.semanticscholar.org/CorpusID:221971078