Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph (2403.18056v1)

Published 26 Mar 2024 in cs.AI

Abstract: Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL's key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Q. Fu, T. Qiu, J. Yi, Z. Pu, and S. Wu, “Concentration network for reinforcement learning of large-scale multi-agent systems,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 9, pp. 9341–9349, Jun. 2022. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/21165
  2. N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. Ortega, D. Strouse, J. Z. Leibo, and N. De Freitas, “Social influence as intrinsic motivation for multi-agent deep reinforcement learning,” in International Conference on Machine Learning, 2019, pp. 3040–3049.
  3. T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2018, pp. 4295–4304.
  4. A. Rakhsha, X. Zhang, X. Zhu, and A. Singla, “Reward poisoning in reinforcement learning: Attacks against unknown learners in unknown environments,” arXiv preprint arXiv:2102.08492, 2021.
  5. C. Yu, A. Velu, E. Yinitsky et al., “The surprising effectiveness of mappo in cooperative, multi-agent games [j/ol],” 2021.
  6. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
  7. O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser et al., “Starcraft ii: A new challenge for reinforcement learning,” arXiv preprint arXiv:1708.04782, 2017.
  8. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
  9. C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dbiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv:1912.06680, 2019.
  10. N. Usunier, G. Synnaeve, Z. Lin, and S. Chintala, “Episodic exploration for deep deterministic policies: An application to starcraft micromanagement tasks,” arXiv preprint arXiv:1609.02993, 2016.
  11. T. Zhang, Z. Liu, S. Wu, Z. Pu, and J. Yi, “Multi-robot cooperative target encirclement through learning distributed transferable policy,” in 2020 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2020, pp. 1–8.
  12. S. Wu, T. Qiu, Z. Pu, and J. Yi, “Multi-agent collaborative learning with relational graph reasoning in adversarial environments,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 5596–5602.
  13. M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson, “The starcraft multi-agent challenge,” arXiv preprint arXiv:1902.04043, pp. 2186–2188, 2019.
  14. K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2019, pp. 5887–5896.
  15. J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018, pp. 2974–2982.
  16. T. Wang, H. Dong, V. Lesser, and C. Zhang, “Roma: Multi-agent reinforcement learning with emergent roles,” in International Conference on Machine Learning, 2020, pp. 9876–9886.
  17. C. Li, J. Liu, Y. Zhang, Y. Wei, Y. Niu, Y. Yang, Y. Liu, and W. Ouyang, “Ace: Cooperative multi-agent q-learning with bidirectional action-dependency,” arXiv preprint arXiv:2211.16068, 2022.
  18. X. Wang, Y. Chen, and W. Zhu, “A survey on curriculum learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4555–4576, 2021.
  19. C. Spatharis, A. Bastas, T. Kravaris, K. Blekas, G. A. Vouros, and J. M. Cordero, “Hierarchical multiagent reinforcement learning schemes for air traffic management,” Neural Computing and Applications, pp. 1–13, 2021.
  20. A. J. Singh, A. Kumar, and H. C. Lau, “Hierarchical multi-agent reinforcement learning for maritime traffic management,” in International Conference on Autonomous Agents and MultiAgent Systems, 2020, pp. 1278–1286.
  21. W.-r. Kong, D.-y. Zhou, Y.-j. Du, Y. Zhou, and Y.-y. Zhao, “Hierarchical multi-agent reinforcement learning for multi-aircraft close-range air combat,” IET Control Theory & Applications, 2022.
  22. B. Wang, S. Li, X. Gao, and T. Xie, “Uav swarm confrontation using hierarchical multiagent reinforcement learning,” International Journal of Aerospace Engineering, vol. 2021, pp. 1–12, 2021.
  23. L. Yue, R. Yang, J. Zuo, Y. Zhang, Q. Li, and Y. Zhang, “Unmanned aerial vehicle swarm cooperative decision-making for sead mission: A hierarchical multiagent reinforcement learning approach,” IEEE Access, vol. 10, pp. 92 177–92 191, 2022.
  24. J. Yang, I. Borovikov, and H. Zha, “Hierarchical cooperative multi-agent reinforcement learning with skill discovery,” in Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, ser. AAMAS ’20, 2020, pp. 1566–1574.
  25. T. Zhang, Z. Liu, Z. Pu, and J. Yi, “Automatic curriculum learning for large-scale cooperative multiagent systems,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2022.
  26. W. Wang, T. Yang, Y. Liu, J. Hao, X. Hao, Y. Hu, Y. Chen, C. Fan, and Y. Gao, “From few to more: Large-scale dynamic multiagent curriculum learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 7293–7300.
  27. J. Chen, Y. Zhang, Y. Xu, H. Ma, H. Yang, J. Song, Y. Wang, and Y. Wu, “Variational automatic curriculum learning for sparse-reward cooperative multi-agent problems,” Advances in Neural Information Processing Systems, vol. 34, pp. 9681–9693, 2021.
  28. Y. Yang, J. Luo, Y. Wen, O. Slumbers, D. Graves, H. B. Ammar, J. Wang, and M. E. Taylor, “Diverse auto-curriculum is critical for successful real-world multiagent learning systems,” arXiv preprint arXiv:2102.07659, 2021.
  29. C. Deng, X. Ji, C. Rainey, J. Zhang, and W. Lu, “Integrating machine learning with human knowledge,” Iscience, vol. 23, no. 11, p. 101656, 2020.
  30. A. Singla, S. Padakandla, and S. Bhatnagar, “Memory-based deep reinforcement learning for obstacle avoidance in uav with limited environment knowledge,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 1, pp. 107–118, 2019.
  31. X. Han, H. Tang, Y. Li, G. Kou, and L. Liu, “Improving multi-agent reinforcement learning with imperfect human knowledge,” in Artificial Neural Networks and Machine Learning–ICANN 2020: 29th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 15–18, 2020, Proceedings, Part II 29.   Springer, 2020, pp. 369–380.
  32. Y. Xian, Z. Fu, S. Muthukrishnan, G. De Melo, and Y. Zhang, “Reinforcement knowledge graph reasoning for explainable recommendation,” in Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019, pp. 285–294.
  33. X. Wang, K. Liu, D. Wang, L. Wu, Y. Fu, and X. Xie, “Multi-level recommendation reasoning over knowledge graphs with reinforcement learning,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 2098–2108.
  34. Q. Fu, T. Qiu, Z. Pu, J. Yi, and W. Yuan, “A cooperation graph approach for multiagent sparse reward reinforcement learning,” in 2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1–8.
  35. J. Wang, Z. Ren, T. Liu, Y. Yu, and C. Zhang, “Qplex: Duplex dueling multi-agent q-learning,” arXiv preprint arXiv:2008.01062, 2020.
  36. J. Hu, S. Jiang, S. A. Harding, H. Wu, and S.-w. Liao, “Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning,” arXiv e-prints, pp. arXiv–2102, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qingxu Fu (8 papers)
  2. Tenghai Qiu (10 papers)
  3. Jianqiang Yi (9 papers)
  4. Zhiqiang Pu (17 papers)
  5. Xiaolin Ai (7 papers)