Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Agent Diagnostics for Robustness via Illuminated Diversity (2401.13460v3)

Published 24 Jan 2024 in cs.LG, cs.AI, and cs.MA

Abstract: In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Anthropic. 2023. Introducing Claude. https://www.anthropic.com/index/introducing-claude Accessed on Oct 6, 2023.
  2. Self-Driving Cars: A Survey. arXiv:1901.04407 [cs.RO]
  3. Open-ended learning in symmetric zero-sum games. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 434–443. https://proceedings.mlr.press/v97/balduzzi19a.html
  4. Dota 2 with Large Scale Deep Reinforcement Learning. CoRR abs/1912.06680 (2019). arXiv:1912.06680
  5. On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705 (2019).
  6. Rémi Coulom. 2007. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Computers and Games, H. Jaap van den Herik, Paolo Ciancarini, and H. H. L. M. (Jeroen) Donkers (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 72–83.
  7. Robots that can adapt like animals. Nature 521 (2015), 503–507.
  8. Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation 22, 2 (2018), 245–259. https://doi.org/10.1109/TEVC.2017.2704781
  9. Real world games look like spinning tops. Advances in Neural Information Processing Systems 33 (2020), 17443–17454.
  10. Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? arXiv:2011.09533 [cs.AI]
  11. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design. In Advances in Neural Information Processing Systems, Vol. 33.
  12. First return, then explore. Nature 590 (2020), 580 – 586. https://api.semanticscholar.org/CorpusID:216552951
  13. SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=5OjLGiJW3u
  14. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems. 2137–2145.
  15. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (New Orleans, Louisiana, USA) (AAAI’18/IAAI’18/EAAI’18). AAAI Press, Article 363, 9 pages.
  16. Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (Cancún, Mexico) (GECCO ’20). Association for Computing Machinery, New York, NY, USA, 94–102. https://doi.org/10.1145/3377930.3390232
  17. Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity. https://arxiv.org/abs/2110.04041
  18. Nikolaus Hansen and Andreas Ostermeier. 2001. Completely Derandomized Self-Adaptation in Evolution Strategies. Evol. Comput. 9, 2 (jun 2001), 159–195. https://doi.org/10.1162/106365601750190398
  19. Sim2Real in robotics and automation: Applications and challenges. IEEE transactions on automation science and engineering 18, 2 (2021), 398–400.
  20. TiKick: towards playing multi-agent football full games from single-agent demonstrations. arXiv preprint arXiv:2110.04507 (2021).
  21. Mix-ME: Quality-Diversity for Multi-Agent Learning. arXiv preprint arXiv:2311.01829 (2023).
  22. Replay-Guided Adversarial Environment Design. In Advances in Neural Information Processing Systems.
  23. Robocup: The robot world cup initiative. In Proceedings of the first international conference on Autonomous agents. 340–347.
  24. RoboCup: A challenge problem for AI. AI magazine 18, 1 (1997), 73–73.
  25. Google Research Football: A Novel Reinforcement Learning Environment. arXiv:1907.11180 [cs.LG]
  26. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. https://arxiv.org/abs/1711.00832
  27. Joel Lehman and Kenneth O Stanley. 2011. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation 19, 2 (2011), 189–223.
  28. Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 3991–4002.
  29. TiZero: Mastering Multi-Agent Football with Curriculum Learning and Self-Play. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (London, United Kingdom) (AAMAS ’23). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 67–76.
  30. Maven: Multi-agent variational exploration. Advances in Neural Information Processing Systems 32 (2019).
  31. Stabilizing Unsupervised Environment Design with a Learned Adversary. In Proceedings of The 2nd Conference on Lifelong Learning Agents (Proceedings of Machine Learning Research, Vol. 232). PMLR, 270–291.
  32. Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv:1504.04909 [cs.AI]
  33. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  34. OpenRL-Lab. 2023. TiZero. https://github.com/OpenRL-Lab/TiZero. GitHub repository.
  35. Evolving Curricula with Regret-Based Environment Design. https://arxiv.org/abs/2203.01302
  36. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning. PMLR, 4295–4304.
  37. Adversarial attacks and defenses in deep learning. Engineering 6, 3 (2020), 346–360.
  38. Reinforcement learning for robot soccer. Autonomous Robots 27 (2009), 55–73.
  39. MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=sKWlRDzPfd7
  40. The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2186–2188.
  41. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (dec 2020), 604–609.
  42. Proximal Policy Optimization Algorithms. ArXiv abs/1707.06347 (2017).
  43. L. S. Shapley. 1953. Stochastic Games. Proceedings of the National Academy of Sciences 39, 10 (1953), 1095–1100. https://doi.org/10.1073/pnas.39.10.1095 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.39.10.1095
  44. Mastering the game of Go with deep neural networks and tree search. Nature 529 (2016), 484–489.
  45. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
  46. Approximate Exploitability: Learning a Best Response. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3487–3493. https://doi.org/10.24963/ijcai.2022/484 Main Track.
  47. Pyribs: A Bare-Bones Python Library for Quality Diversity Optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (Lisbon, Portugal) (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 220–229. https://doi.org/10.1145/3583131.3590374
  48. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
  49. Game Plan: What AI Can Do for Football, and What Football Can Do for AI. J. Artif. Int. Res. 71 (sep 2021), 41–88. https://doi.org/10.1613/jair.1.12505
  50. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nat. 575, 7782 (2019), 350–354. https://doi.org/10.1038/s41586-019-1724-z
  51. Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2302.03429 (2023).
  52. Adversarial Policies Beat Superhuman Go AIs. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 35655–35739. https://proceedings.mlr.press/v202/wang23g.html
  53. Multi-agent reinforcement learning is a sequence modeling problem. Advances in Neural Information Processing Systems 35 (2022), 16509–16521.
  54. David J Wu. 2019. Accelerating self-play learning in go. arXiv preprint arXiv:1902.10565 (2019).
  55. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 7896 (Feb. 2022), 223–228.
  56. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=YVXaxB6L2Pl
  57. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey. 2020 IEEE Symposium Series on Computational Intelligence (SSCI) (2020), 737–744. https://api.semanticscholar.org/CorpusID:221971078
Citations (1)

Summary

  • The paper introduces MADRID, a novel approach using quality-diversity exploration to diagnose robustness issues in multi-agent systems.
  • The methodology employs MAP-Elites and regret metrics to systematically uncover adversarial scenarios that spotlight strategic errors.
  • Experimental results on Google Research Football demonstrate that MADRID effectively identifies flaws like offside mishandling and own goal tendencies in TiZero.

Introduction

Multi-agent systems are pivotal for a variety of AI applications, notably those involving interactions with humans. However, their robustness is often challenged in unfamiliar or adversarial situations due to overfitting during training. This paper introduces Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), aiming to generate diverse adversarial scenarios to diagnose strategic errors in pre-trained multi-agent policies. Integrating concepts from open-ended learning, MADRID identifies vulnerabilities using a target policy's regret, demonstrating its efficacy within the complex environment of Google Research Football.

Methodology

MADRID's foundation lies in the framework of quality-diversity (QD), which involves creating numerous high-quality solutions each with unique attributes. The paper details the use of MAP-Elites, a QD method that explores vast spaces of adversarial settings. This exploration is structured through discretization, mutation, and evaluation processes enhancing the diversity and performance of adversarial scenarios. With the use of a target policy's regret as a measurement tool, MADRID uncovers situations where reference policies outperform the target, thus illuminating potential strategic flaws.

Experimental Validation

The paper's empirical exploration concentrates on the 11 vs 11 setup from Google Research Football, analyzing MADRID against TiZero, a leading multi-agent RL approach. Results demonstrate that MADRID successfully uncovers critical weaknesses in TiZero's tactical decision-making. The scenarios created by MADRID reveal TiZero's inability to cope with specific adversarial settings, such as improper handling of the offside rule and a tendency towards own goals. The paper underscores the necessity for rigorous evaluation in multi-agent systems to enhance overall robustness.

Analysis and Insights

The paper offers a qualitative analysis of adversarial levels identified by MADRID, exploring the nuanced shortcomings of the TiZero policy. It becomes evident that high-regret levels are often associated with poor strategic choices such as incorrect ball passing or positions for shooting. The findings affirm that multi-agent systems, even after extensive training, harbor latent vulnerabilities. MADRID's methodology not only showcases these vulnerabilities but also provides a means for future refinement of multi-agent strategies. This research stands as a testament to the importance of diagnosing and addressing such strategic errors for the progression of resilient multi-agent systems in AI.

Youtube Logo Streamline Icon: https://streamlinehq.com