Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration (2404.03869v2)

Published 5 Apr 2024 in cs.LG, cs.AI, cs.MA, cs.RO, cs.SY, and eess.SY

Abstract: The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. We first leverage a latent network to learn strategy patterns for each agent adaptively. Second, we introduce a heterogeneous layer to be inserted into decision-making networks, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity, allowing SHPPO to adapt effectively to varying scales. SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability, and offering insights into the learned latent variables' impact on team performance by visualization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. R. Axelrod and W. D. Hamilton, “The evolution of cooperation,” Science, vol. 211, no. 4489, pp. 1390–1396, 1981.
  2. V. R. Lesser, “Reflections on the nature of multi-agent coordination and its implications for an agent architecture,” Autonomous agents and multi-agent systems, vol. 1, pp. 89–111, 1998.
  3. X. Guo, P. Chen, S. Liang, Z. Jiao, L. Li, J. Yan, Y. Huang, Y. Liu, and W. Fan, “Pacar: Covid-19 pandemic control decision making via large-scale agent-based modeling and deep reinforcement learning,” Medical Decision Making, vol. 42, no. 8, pp. 1064–1077, 2022.
  4. J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22.
  5. H. Li, Y. Q. Chong, S. Stepputtis, J. Campbell, D. Hughes, M. Lewis, and K. Sycara, “Theory of mind for multi-agent collaboration via large language models,” arXiv preprint arXiv:2310.10701, 2023.
  6. H. Sha, Y. Mu, Y. Jiang, L. Chen, C. Xu, P. Luo, S. E. Li, M. Tomizuka, W. Zhan, and M. Ding, “LanguageMPC: Large language models as decision makers for autonomous driving,” arXiv preprint arXiv:2310.03026, 2023.
  7. H. Zhang, S. Feng, C. Liu, Y. Ding, Y. Zhu, Z. Zhou, W. Zhang, Y. Yu, H. Jin, and Z. Li, “CityFlow: A multi-agent reinforcement learning environment for large scale city traffic scenario,” in The World Wide Web Conference.   ACM, 2019, pp. 3620–3624.
  8. J. Wang, T. Shi, Y. Wu, L. Miranda-Moreno, and L. Sun, “Multi-agent graph reinforcement learning for connected automated driving,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, pp. 1–6.
  9. L. Wang, L. Han, X. Chen, C. Li, J. Huang, W. Zhang, W. Zhang, X. He, and D. Luo, “Hierarchical multiagent reinforcement learning for allocating guaranteed display ads,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13, 2021.
  10. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
  11. V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” in Advances in Neural Information Processing Systems, vol. 12, 1999.
  12. R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Advances in neural information processing systems, vol. 30, 2017.
  13. S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” in International conference on machine learning.   PMLR, 2019, pp. 2961–2970.
  14. J. G. Kuba, R. Chen, M. Wen, Y. Wen, F. Sun, J. Wang, and Y. Yang, “Trust region policy optimisation in multi-agent reinforcement learning,” arXiv preprint arXiv:2109.11251, 2021.
  15. C. Li, T. Wang, C. Wu, Q. Zhao, J. Yang, and C. Zhang, “Celebrating diversity in shared multi-agent reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 3991–4002, 2021.
  16. S. Hu, F. Zhu, X. Chang, and X. Liang, “Updet: Universal multi-agent rl via policy decoupling with transformers,” in International Conference on Learning Representations, 2020.
  17. T. Zhou, F. Zhang, K. Shao, K. Li, W. Huang, J. Luo, W. Wang, Y. Yang, H. Mao, B. Wang et al., “Cooperative multi-agent transfer learning with level-adaptive credit assignment,” arXiv preprint arXiv:2106.00517, 2021.
  18. F. Christianos, G. Papoudakis, M. A. Rahman, and S. V. Albrecht, “Scaling multi-agent reinforcement learning with selective parameter sharing,” in International Conference on Machine Learning.   PMLR, 2021, pp. 1989–1998.
  19. J. K. Terry, N. Grammel, S. Son, B. Black, and A. Agrawal, “Revisiting parameter sharing in multi-agent deep reinforcement learning,” arXiv preprint arXiv:2005.13625, 2020.
  20. C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in cooperative multi-agent games,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 611–24 624, 2022.
  21. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  22. M. Samvelyan, T. Rashid, C. Schroeder de Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson, “The starcraft multi-agent challenge,” in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 2186–2188.
  23. K. Kurach, A. Raichuk, P. Stańczyk, M. Zając, O. Bachem, L. Espeholt, C. Riquelme, D. Vincent, M. Michalski, O. Bousquet et al., “Google research football: A novel reinforcement learning environment,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 4501–4510.
  24. T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 7234–7284, 2020.
  25. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  26. A. Oroojlooy and D. Hajinezhad, “A review of cooperative multi-agent deep reinforcement learning,” Applied Intelligence, vol. 53, no. 11, pp. 13 677–13 722, 2023.
  27. Y. Du, J. Z. Leibo, U. Islam, R. Willis, and P. Sunehag, “A review of cooperation in multi-agent learning,” arXiv preprint arXiv:2312.05162, 2023.
  28. T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,” IEEE transactions on cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020.
  29. X. Guo, D. Shi, and W. Fan, “Scalable communication for multi-agent reinforcement learning via transformer-based email mechanism,” in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, 2023, pp. 126–134.
  30. Z. Ding, T. Huang, and Z. Lu, “Learning individually inferred communication for multi-agent cooperation,” Advances in Neural Information Processing Systems, vol. 33, pp. 22 069–22 079, 2020.
  31. Q. Long, Z. Zhou, A. Gupta, F. Fang, Y. Wu, and X. Wang, “Evolutionary population curriculum for scaling multi-agent reinforcement learning,” arXiv preprint arXiv:2003.10423, 2020.
  32. W. Wang, T. Yang, Y. Liu, J. Hao, X. Hao, Y. Hu, Y. Chen, C. Fan, and Y. Gao, “From few to more: Large-scale dynamic multiagent curriculum learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 7293–7300.
  33. X. Hao, H. Mao, W. Wang, Y. Yang, D. Li, Y. Zheng, Z. Wang, and J. Hao, “Breaking the curse of dimensionality in multiagent state space: A unified agent permutation framework,” arXiv preprint arXiv:2203.05285, 2022.
  34. A. Agarwal, S. Kumar, and K. Sycara, “Learning transferable cooperative behavior in multi-agent teams,” arXiv preprint arXiv:1906.01202, 2019.
  35. M. Wen, J. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, “Multi-agent reinforcement learning is a sequence modeling problem,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 509–16 521, 2022.
  36. R. Qin, F. Chen, T. Wang, L. Yuan, X. Wu, Z. Zhang, C. Zhang, and Y. Yu, “Multi-agent policy transfer via task relationship modeling,” arXiv preprint arXiv:2203.04482, 2022.
  37. M. Bettini, A. Shankar, and A. Prorok, “Heterogeneous multi-robot reinforcement learning,” arXiv preprint arXiv:2301.07137, 2023.
  38. D. Nguyen, P. Nguyen, S. Venkatesh, and T. Tran, “Learning to transfer role assignment across team sizes,” arXiv preprint arXiv:2204.12937, 2022.
  39. S. Hu, C. Xie, X. Liang, and X. Chang, “Policy diagnosis via measuring role diversity in cooperative multi-agent rl,” in International Conference on Machine Learning.   PMLR, 2022, pp. 9041–9071.
  40. D. Wang, F. Zhong, M. Wen, M. Li, Y. Peng, T. Li, and Y. Yang, “Romat: Role-based multi-agent transformer for generalizable heterogeneous cooperation,” Neural Networks, p. 106129, 2024.
  41. Z. Hu, Z. Zhang, H. Li, C. Chen, H. Ding, and Z. Wang, “Attention-guided contrastive role representations for multi-agent reinforcement learning,” in The Twelfth International Conference on Learning Representations, 2024.
  42. T. Wang, H. Dong, V. Lesser, and C. Zhang, “Roma: Multi-agent reinforcement learning with emergent roles,” in International Conference on Machine Learning.   PMLR, 2020, pp. 9876–9886.
  43. T. Wang, T. Gupta, B. Peng, A. Mahajan, S. Whiteson, and C. Zhang, “Rode: learning roles to decompose multi- agent tasks,” in Proceedings of the International Conference on Learning Representations.   OpenReview, 2021.
  44. M. Yang, J. Zhao, X. Hu, W. Zhou, J. Zhu, and H. Li, “Ldsa: Learning dynamic subtask assignment in cooperative multi-agent reinforcement learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 1698–1710, 2022.
  45. M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Machine learning proceedings 1994.   Elsevier, 1994, pp. 157–163.
  46. M. Tan, “Multi-agent reinforcement learning: Independent vs. cooperative agents,” in Proceedings of the tenth international conference on machine learning, 1993, pp. 330–337.
  47. G. Palmer, K. Tuyls, D. Bloembergen, and R. Savani, “Lenient multi-agent deep reinforcement learning,” in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018, pp. 443–451.
  48. L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1-2, pp. 99–134, 1998.
  49. M. A. Wiering and M. Van Otterlo, “Reinforcement learning,” Adaptation, learning, and optimization, vol. 12, no. 3, p. 729, 2012.
  50. J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy optimization,” Computer Science, pp. 1889–1897, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xudong Guo (7 papers)
  2. Daming Shi (5 papers)
  3. Junjie Yu (11 papers)
  4. Wenhui Fan (9 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.