Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning (2312.04819v2)
Abstract: Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of Attention-guided COntrastive Role representation learning for MARL (ACORM) to promote behavior heterogeneity, knowledge transfer, and skillful coordination across agents. First, we introduce mutual information maximization to formalize role representation learning, derive a contrastive learning objective, and concisely approximate the distribution of negative pairs. Second, we leverage an attention mechanism to prompt the global state to attend to learned role representations in value decomposition, implicitly guiding agent coordination in a skillful role space to yield more expressive credit assignment. Experiments on challenging StarCraft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. Our code is available at https://github.com/NJU-RL/ACORM.
- O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
- M. Zhou, J. Luo, J. Villella, Y. Yang, D. Rusu, J. Miao, W. Zhang, M. Alban, I. Fadakar, Z. Chen et al., “SMARTS: Scalable multi-agent reinforcement learning training school for autonomous driving,” in Proceedings of Conference on Robot Learning, 2020.
- T. Chu, J. Wang, L. Codecà, and Z. Li, “Multi-agent deep reinforcement learning for large-scale traffic signal control,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 3, pp. 1086–1095, 2019.
- D. Chen, K. Chen, Z. Li, T. Chu, R. Yao, F. Qiu, and K. Lin, “Powernet: Multi-agent deep reinforcement learning for scalable powergrid control,” IEEE Transactions on Power Systems, vol. 37, no. 2, pp. 1007–1017, 2021.
- C. Yu, X. Yang, J. Gao, J. Chen, Y. Li, J. Liu, Y. Xiang, R. Huang, H. Yang, Y. Wu, and Y. Wang, “Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration,” in Proceedings of International Conference on Autonomous Agents and Multi-Agent Systems, 2023, pp. 1107–1115.
- J. Z. Leibo, V. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel, “Multi-agent reinforcement learning in sequential social dilemmas,” in Proceedings of International Conference on Autonomous Agents and Multi-Agent Systems, 2017, pp. 464–473.
- J. Foerster, I. A. Assael, N. De Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” in Advances in Neural Information Processing Systems, 2016, pp. 2137–2145.
- M. Tan, “Multi-agent reinforcement learning: Independent vs. cooperative agents,” in Proceedings of International Conference on Machine Learning, 1993, pp. 330–337.
- J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. Torr, P. Kohli, and S. Whiteson, “Stabilising experience replay for deep multi-agent reinforcement learning,” in Proceedings of International Conference on Machine Learning, 2017, pp. 1146–1155.
- C. Claus and C. Boutilier, “The dynamics of reinforcement learning in cooperative multiagent systems,” in Proceedings of AAAI Conference on Artificial Intelligence, 1998, pp. 746–752.
- S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multiagent communication with backpropagation,” in Advances in Neural Information Processing Systems, 2016, pp. 2244–2252.
- J. Wang, D. Ye, and Z. Lu, “More centralized training, still decentralized execution: Multi-agent conditional policy factorization,” in Proceedings of International Conference on Learning Representations, 2023.
- P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls et al., “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proceedings of International Conference on Autonomous Agents and Multi-Agent Systems, 2018, pp. 2085–2087.
- T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” Journal of Machine Learning Research, vol. 21, no. 1, pp. 7234–7284, 2020.
- S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” in Proceedings of International Conference on Machine Learning, 2019, pp. 2961–2970.
- C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,” in Advances in Neural Information Processing Systems, 2022, pp. 24 611–24 624.
- W. Fu, C. Yu, Z. Xu, J. Yang, and Y. Wu, “Revisiting some common practices in cooperative multi-agent reinforcement learning,” in Proceedings of International Conference on Machine Learning, 2022, pp. 6863–6877.
- F. Christianos, G. Papoudakis, M. A. Rahman, and S. V. Albrecht, “Scaling multi-agent reinforcement learning with selective parameter sharing,” in Proceedings of International Conference on Machine Learning, 2021, pp. 1989–1998.
- C. Li, T. Wang, C. Wu, Q. Zhao, J. Yang, and C. Zhang, “Celebrating diversity in shared multi-agent reinforcement learning,” in Advances in Neural Information Processing Systems, 2021, pp. 3991–4002.
- J. Jiang and Z. Lu, “The emergence of individuality,” in Proceedings of International Conference on Machine Learning, 2021, pp. 4992–5001.
- S. Liu, Y. Zhou, J. Song, T. Zheng, K. Chen, T. Zhu, Z. Feng, and M. Song, “Contrastive identity-aware learning for multi-agent value decomposition,” in Proceedings of AAAI Conference on Artificial Intelligence, 2023, pp. 11 595–11 603.
- S. Hu, C. Xie, X. Liang, and X. Chang, “Policy diagnosis via measuring role diversity in cooperative multi-agent RL,” in Proceedings of International Conference on Machine Learning, 2022, pp. 9041–9071.
- J. Shao, Z. Lou, H. Zhang, Y. Jiang, S. He, and X. Ji, “Self-organized group for cooperative multi-agent reinforcement learning,” in Advances in Neural Information Processing Systems, 2022, pp. 5711–5723.
- K. Kurach, A. Raichuk, P. Stańczyk, M. Zając, O. Bachem, L. Espeholt, C. Riquelme, D. Vincent, M. Michalski, O. Bousquet et al., “Google research football: A novel reinforcement learning environment,” in Proceedings of AAAI Conference on Artificial Intelligence, 2020, pp. 4501–4510.
- M. Dastani, V. Dignum, and F. Dignum, “Role-assignment in open agent societies,” in Proceedings of International Conference on Autonomous Agents and Multi-Agent Systems, 2003, pp. 489–496.
- M. Sims, D. Corkill, and V. Lesser, “Automated organization design for multi-agent systems,” Autonomous Agents and Multi-Agent Systems, vol. 16, pp. 151–185, 2008.
- K. M. Lhaksmana, Y. Murakami, and T. Ishida, “Role-based modeling for designing agent behavior in self-organizing multi-agent systems,” International Journal of Software Engineering and Knowledge Engineering, vol. 28, no. 01, pp. 79–96, 2018.
- T. Wang, H. Dong, V. Lesser, and C. Zhang, “ROMA: Multi-agent reinforcement learning with emergent roles,” in Proceedings of International Conference on Machine Learning, 2020, pp. 9876–9886.
- T. Wang, T. Gupta, A. Mahajan, B. Peng, S. Whiteson, and C. Zhang, “RODE: Learning roles to decompose multi-agent tasks,” in Proceedings of International Conference on Learning Representations, 2021.
- B. Liu, Q. Liu, P. Stone, A. Garg, Y. Zhu, and A. Anandkumar, “Coach-player multi-agent reinforcement learning for dynamic team composition,” in Proceedings of International Conference on Machine Learning, 2021, pp. 6860–6870.
- Y. Liu, Y. Li, X. Xu, Y. Dou, and D. Liu, “Heterogeneous skill learning for multi-agent tasks,” in Advances in Neural Information Processing Systems, 2022, pp. 37 011–37 023.
- M. Yang, J. Zhao, X. Hu, W. Zhou, J. Zhu, and H. Li, “LDSA: Learning dynamic subtask assignment in cooperative multi-agent reinforcement learning,” in Advances in Neural Information Processing Systems, 2022, pp. 1698–1710.
- S. Iqbal, R. Costales, and F. Sha, “ALMA: Hierarchical learning for composite multi-agent tasks,” in Advances in Neural Information Processing Systems, 2022, pp. 7155–7166.
- Y. Zhang, J. He, K. Li, H. Fu, Q. Fu, J. Xing, and J. Cheng, “Automatic grouping for efficient cooperative multi-agent reinforcement learning,” in Advances in Neural Information Processing Systems, 2023.
- M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson, “The StarCraft multi-agent challenge,” arXiv preprint arXiv:1902.04043, 2019.
- A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
- M. Laskin, A. Srinivas, and P. Abbeel, “CURL: Contrastive unsupervised representations for reinforcement learning,” in Proceedings of International Conference on Machine Learning, 2020, pp. 5639–5650.
- H. Yuan and Z. Lu, “Robust task representations for offline meta-reinforcement learning via contrastive learning,” in Proceedings of International Conference on Machine Learning, 2022, pp. 25 747–25 759.
- J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means clustering algorithm,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 100–108, 1979.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, 2020, pp. 1877–1901.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proceedings of International Conference on Learning Representations, 2021.
- L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, “Decision transformer: Reinforcement learning via sequence modeling,” in Advances in Neural Information Processing Systems, 2021, pp. 15 084–15 097.
- L. Yuan, C. Wang, J. Wang, F. Zhang, F. Chen, C. Guan, Z. Zhang, C. Zhang, and Y. Yu, “Multi-agent concentrative coordination with decentralized task representation,” in Proceedings of International Joint Conference on Artificial Intelligence, 2022, pp. 599–605.
- R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Advances in Neural Information Processing Systems, 2017, pp. 6379–6390.
- K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in Proceedings of International Conference on Machine Learning, 2019, pp. 5887–5896.
- A. Mahajan, T. Rashid, M. Samvelyan, and S. Whiteson, “MAVEN: Multi-agent variational exploration,” in Advances in Neural Information Processing Systems, 2019, pp. 7611–7622.
- Y. Xia, J. Zhu, and L. Zhu, “Dynamic role discovery and assignment in multi-agent task decomposition,” Complex & Intelligent Systems, pp. 1–12, 2023.
- J. Cao, L. Yuan, J. Wang, S. Zhang, C. Zhang, Y. Yu, and D.-C. Zhan, “LINDA: Multi-agent local information decomposition for awareness of teammates,” Science China Information Sciences, vol. 66, no. 182101, 2023.
- J. Yang, I. Borovikov, and H. Zha, “Hierarchical cooperative multi-agent reinforcement learning with skill discovery,” in Proceedings of International Conference on Autonomous Agents and Multi-Agent Systems, 2020, pp. 1566–1574.
- X. Zeng, H. Peng, and A. Li, “Effective and stable role-based multi-agent collaboration by structural information principles,” in Proceedings of AAAI Conference on Artificial Intelligence, 2023, pp. 11 772–11 780.
- Y. Su, T. Lan, Y. Wang, D. Yogatama, L. Kong, and N. Collier, “A contrastive framework for neural text generation,” in Advances in Neural Information Processing Systems, 2022, pp. 21 548–21 561.
- H. Song, M. Feng, W. Zhou, and H. Li, “MA2CL: Masked attentive contrastive learning for multi-agent reinforcement learning,” in Proceedings of International Joint Conference on Artificial Intelligence, 2023, pp. 4226–4234.
- Y. L. Lo and B. Sengupta, “Learning to ground decentralized multi-agent communication with contrastive learning,” arXiv preprint arXiv:2203.03344, 2022.
- M. Wen, J. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, “Multi-agent reinforcement learning is a sequence modeling problem,” in Advances in Neural Information Processing Systems, 2022, pp. 16 509–16 521.
- T. Phan, F. Ritz, P. Altmann, M. Zorn, J. Nüßlein, M. Kölle, T. Gabor, and C. Linnhoff-Popien, “Attention-based recurrence for multi-agent reinforcement learning under stochastic partial observability,” in Proceedings of International Conference on Machine Learning, 2023, pp. 27 840–27 853.
- Y. Yang, J. Hao, B. Liao, K. Shao, G. Chen, W. Liu, and H. Tang, “Qatten: A general framework for cooperative multiagent reinforcement learning,” arXiv preprint arXiv:2002.03939, 2020.
- J. Shao, H. Zhang, Y. Qu, C. Liu, S. He, Y. Jiang, and X. Ji, “Complementary attention for multi-agent reinforcement learning,” in Proceedings of International Conference on Machine Learning, 2023, pp. 30 776–30 793.
- Y. Zhai, P. Peng, C. Su, and Y. Tian, “Dynamic belief for decentralized multi-agent cooperative learning,” in Proceedings of International Joint Conference on Artificial Intelligence, 2023, pp. 344–352.