Prioritized League Reinforcement Learning for Large-Scale Heterogeneous Multiagent Systems (2403.18057v1)
Abstract: Large-scale heterogeneous multiagent systems feature various realistic factors in the real world, such as agents with diverse abilities and overall system cost. In comparison to homogeneous systems, heterogeneous systems offer significant practical advantages. Nonetheless, they also present challenges for multiagent reinforcement learning, including addressing the non-stationary problem and managing an imbalanced number of agents with different types. We propose a Prioritized Heterogeneous League Reinforcement Learning (PHLRL) method to address large-scale heterogeneous cooperation problems. PHLRL maintains a record of various policies that agents have explored during their training and establishes a heterogeneous league consisting of diverse policies to aid in future policy optimization. Furthermore, we design a prioritized policy gradient approach to compensate for the gap caused by differences in the number of different types of agents. Next, we use Unreal Engine to design a large-scale heterogeneous cooperation benchmark named Large-Scale Multiagent Operation (LSMO), which is a complex two-team competition scenario that requires collaboration from both ground and airborne agents. We use experiments to show that PHLRL outperforms state-of-the-art methods, including QTRAN and QPLEX in LSMO.
- P. Hernandez-Leal, M. Kaisers, T. Baarslag, and E. M. De Cote, “A survey of learning in multiagent environments: Dealing with non-stationarity,” arXiv preprint arXiv:1707.09183, 2017.
- Q. Fu, T. Qiu, and J. Yi, “Learning heterogeneous agent cooperation via multiagent league training,” arXiv preprint arXiv:2208.08496, 2022.
- P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls et al., “Value-decomposition networks for cooperative multi-agent learning,” arXiv preprint arXiv:1706.05296, 2017.
- B. J. A. Kröse, “Learning from delayed rewards,” Robotics Auton. Syst., vol. 15, no. 4, pp. 233–235, 1995.
- T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in International Conference on Machine Learning. PMLR, 2018, pp. 4295–4304.
- T. Rashid, G. Farquhar, B. Peng, and S. Whiteson, “Weighted qmix: Expanding monotonic value function factorisation,” arXiv e-prints, pp. arXiv–2006, 2020.
- J. Wang, Z. Ren, T. Liu, Y. Yu, and C. Zhang, “Qplex: Duplex dueling multi-agent q-learning,” arXiv preprint arXiv:2008.01062, 2020.
- K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in International Conference on Machine Learning. PMLR, 2019, pp. 5887–5896.
- Y. Yu, T. Wang, and S. C. Liew, “Deep-reinforcement learning multiple access for heterogeneous wireless networks,” IEEE journal on selected areas in communications, vol. 37, no. 6, pp. 1277–1290, 2019.
- Y. Ishiwaka, T. Sato, and Y. Kakazu, “An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning,” Robotics and Autonomous Systems, vol. 43, no. 4, pp. 245–256, 2003.
- N. Zhao, Y.-C. Liang, D. Niyato, Y. Pei, M. Wu, and Y. Jiang, “Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 11, pp. 5141–5152, 2019.
- A. I. Orhean, F. Pop, and I. Raicu, “New scheduling approach using reinforcement learning for heterogeneous distributed systems,” Journal of Parallel and Distributed Computing, vol. 117, pp. 292–302, 2018.
- L. Zhang, Y. Sun, A. Barth, and O. Ma, “Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning,” IEEE Access, vol. 8, pp. 184 109–184 119, 2020.
- F. Cordes, I. Ahrns, S. Bartsch, T. Birnschein, A. Dettmann, S. Estable, S. Haase, J. Hilljegerdes, D. Koebel, S. Planthaber et al., “Lunares: Lunar crater exploration with heterogeneous multi robot systems,” Intelligent Service Robotics, vol. 4, pp. 61–89, 2011.
- M. J. Schuster, M. G. Müller, S. G. Brunner, H. Lehner, P. Lehner, R. Sakagami, A. Dömel, L. Meyer, B. Vodermayer, R. Giubilato et al., “The arches space-analogue demonstration mission: Towards heterogeneous teams of autonomous robots for collaborative scientific sampling in planetary exploration,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5315–5322, 2020.
- M. Irfan and A. Farooq, “Auction-based task allocation scheme for dynamic coalition formations in limited robotic swarms with heterogeneous capabilities,” in 2016 International Conference on Intelligent Systems Engineering (ICISE). IEEE, 2016, pp. 210–215.
- J. Li, Y. Ma, R. Gao, Z. Cao, A. Lim, W. Song, and J. Zhang, “Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem,” IEEE Transactions on Cybernetics, vol. 52, no. 12, pp. 13 572–13 585, 2021.
- W. Qin, Z. Zhuang, Z. Huang, and H. Huang, “A novel reinforcement learning-based hyper-heuristic for heterogeneous vehicle routing problem,” Computers & Industrial Engineering, vol. 156, p. 107252, 2021.
- J. Li, L. Xin, Z. Cao, A. Lim, W. Song, and J. Zhang, “Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 3, pp. 2306–2315, 2021.
- B. Deng, C. Jiang, H. Yao, S. Guo, and S. Zhao, “The next generation heterogeneous satellite communication networks: Integration of resource management and deep reinforcement learning,” IEEE Wireless Communications, vol. 27, no. 2, pp. 105–111, 2019.
- C. Jiang and X. Zhu, “Reinforcement learning based capacity management in multi-layer satellite networks,” IEEE Transactions on Wireless Communications, vol. 19, no. 7, pp. 4685–4699, 2020.
- G. Bao, L. Ma, and X. Yi, “Recent advances on cooperative control of heterogeneous multi-agent systems subject to constraints: A survey,” Systems Science & Control Engineering, vol. 10, no. 1, pp. 539–551, 2022.
- J. Gong, D. Ning, X. Wu, and G. He, “Bounded leader-following consensus of heterogeneous directed delayed multi-agent systems via asynchronous impulsive control,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 68, no. 7, pp. 2680–2684, 2021.
- T. Yan, X. Xu, Z. Li, and E. Li, “Flocking of multi-agent systems with unknown nonlinear dynamics and heterogeneous virtual leader,” International Journal of Control, Automation and Systems, vol. 19, no. 9, pp. 2931–2939, 2021.
- C. Liu, B. Jiang, R. J. Patton, and K. Zhang, “Decentralized output sliding-mode fault-tolerant control for heterogeneous multiagent systems,” IEEE transactions on cybernetics, vol. 50, no. 12, pp. 4934–4945, 2019.
- H. Ye, M. Li, and W. Luo, “Consensus protocols for heterogeneous multiagent systems with disturbances via integral sliding mode control,” Mathematical Problems in Engineering, vol. 2018, 2018.
- S. Kapetanakis and D. Kudenko, “Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems,” in AAMAS, vol. 4, 2004, pp. 1258–1259.
- C. Wakilpoor, P. J. Martin, C. Rebhuhn, and A. Vu, “Heterogeneous multi-agent reinforcement learning for unknown environment mapping,” arXiv preprint arXiv:2010.02663, 2020.
- W. Du, S. Ding, C. Zhang, and Z. Shi, “Multiagent reinforcement learning with heterogeneous graph attention network,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- H. Zheng, P. Wei, J. Jiang, G. Long, Q. Lu, and C. Zhang, “Cooperative heterogeneous deep reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 17 455–17 465, 2020.
- M. Bettini, A. Shankar, and A. Prorok, “Heterogeneous multi-robot reinforcement learning,” arXiv preprint arXiv:2301.07137, 2023.
- M. J. Matarić, “Reinforcement learning in the multi-robot domain,” in Robot Colonies. Springer, 1997, pp. 73–83.
- Z. Lin, C. Li, L. Tian, and B. Zhang, “A scheduling algorithm based on reinforcement learning for heterogeneous environments,” Applied Soft Computing, vol. 130, p. 109707, 2022.
- W. U. Mondal, M. Agarwal, V. Aggarwal, and S. V. Ukkusuri, “On the approximation of cooperative heterogeneous multi-agent reinforcement learning (marl) using mean field control (mfc),” The Journal of Machine Learning Research, vol. 23, no. 1, pp. 5614–5659, 2022.
- M. Wen, J. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, “Multi-agent reinforcement learning is a sequence modeling problem,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 509–16 521, 2022.
- E. Seraj, Z. Wang, R. Paleja, D. Martin, M. Sklar, A. Patel, and M. Gombolay, “Learning efficient diverse communication for cooperative heterogeneous teaming,” in Proceedings of the 21st international conference on autonomous agents and multiagent systems, 2022, pp. 1173–1182.
- A. Deka and K. Sycara, “Natural emergence of heterogeneous strategies in artificially intelligent competitive teams,” in International Conference on Swarm Intelligence. Springer, 2021, pp. 13–25.
- L. Zheng, J. Yang, H. Cai, M. Zhou, W. Zhang, J. Wang, and Y. Yu, “Magent: A many-agent reinforcement learning platform for artificial collective intelligence,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
- Q. Fu, T. Qiu, J. Yi, Z. Pu, and S. Wu, “Concentration network for reinforcement learning of large-scale multi-agent systems,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 9, pp. 9341–9349, Jun. 2022.
- R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” arXiv preprint arXiv:1706.02275, pp. 6382–6393, 2017.
- M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson, “The starcraft multi-agent challenge,” arXiv preprint arXiv:1902.04043, pp. 2186–2188, 2019.
- O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
- X. Wang, J. Song, P. Qi, P. Peng, Z. Tang, W. Zhang, W. Li, X. Pi, J. He, C. Gao, H. Long, and Q. Yuan, “Scc: an efficient deep reinforcement learning agent mastering the game of starcraft ii,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18–24 Jul 2021, pp. 10 905–10 915.
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- Q. Fu and T. Hu, “U-map: Developing complex multi-agent reinforcement learning benchmarks with unreal engine.” https://github.com/binary-husky/unreal-map/, 2023.
- D. Ye, Z. Liu, M. Sun, B. Shi, P. Zhao, H. Wu, H. Yu, S. Yang, X. Wu, Q. Guo et al., “Mastering complex control in moba games with deep reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 6672–6679.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- J. Su, S. Adams, and P. A. Beling, “Value-decomposition multi-agent actor-critics,” in AAAI Conference on Artificial Intelligence, vol. 35, no. 13, 2021, pp. 11 352–11 360.