Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning (2403.19253v2)
Abstract: Effective agent coordination is crucial in cooperative Multi-Agent Reinforcement Learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. Additionally, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for MARL. The LTS-CG leverages agents' historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relation uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG's superior performance.
- M. Wang, L. Wu, J. Li, and L. He, “Traffic signal control with reinforcement learning based on region-aware cooperative strategy,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 6774–6785, 2022.
- Y. Rizk, M. Awad, and E. W. Tunstel, “Cooperative heterogeneous multi-robot systems: A survey,” ACM Comput. Surv., vol. 52, no. 2, pp. 29:1–29:31, 2019.
- J. Cui, Y. Liu, and A. Nallanathan, “Multi-agent reinforcement learning-based resource allocation for UAV networks,” IEEE Trans. Wirel. Commun., vol. 19, no. 2, pp. 729–743, 2020.
- P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. F. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2018), Stockholm, Sweden, 2018, pp. 2085–2087.
- T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, “QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning,” in the 35th International Conference on Machine Learning (ICML 2018), Stockholmsmässan, Stockholm, Sweden, vol. 80, 2018, pp. 4292–4301.
- K. Son, D. Kim, W. J. Kang, D. Hostallero, and Y. Yi, “QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in the 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA, vol. 97, 2019, pp. 5887–5896.
- Y. Hong, Y. Jin, and Y. Tang, “Rethinking individual global max in cooperative multi-agent reinforcement learning,” in the 36th Annual Conference on Neural Information Processing Systems (NIPS 2022), vol. 35, 2022, pp. 32 438–32 449.
- C. Guestrin, M. G. Lagoudakis, and R. Parr, “Coordinated reinforcement learning,” in the 19th International Conference (ICML 2002), University of New South Wales, Sydney, Australia, 2002, pp. 227–234.
- I.-J. Liu, R. A. Yeh, and A. G. Schwing, “Pic: Permutation invariant critic for multi-agent deep reinforcement learning,” in the 3rd Conference on Robot Learning (CoRL 2019), Osaka, Japan, vol. 100, 2020, pp. 590–602.
- W. Boehmer, V. Kurin, and S. Whiteson, “Deep coordination graphs,” in the 37th International Conference on Machine Learning (ICML 2020), Virtual Event, vol. 119, 2020, pp. 980–991.
- N. Naderializadeh, F. H. Hung, S. Soleyman, and D. Khosla, “Graph convolutional value decomposition in multi-agent reinforcement learning,” CoRR, vol. abs/2010.04740, 2020.
- S. Li, J. K. Gupta, P. Morales, R. E. Allen, and M. J. Kochenderfer, “Deep implicit coordination graphs for multi-agent reinforcement learning,” in the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), Virtual Event, United Kingdom, 2021, pp. 764–772.
- Q. Yang, W. Dong, Z. Ren, J. Wang, T. Wang, and C. Zhang, “Self-organized polynomial-time coordination graphs,” in International Conference on Machine Learning (ICML 2022), Baltimore, Maryland, USA, vol. 162, 2022, pp. 24 963–24 979.
- T. Wang, L. Zeng, W. Dong, Q. Yang, Y. Yu, and C. Zhang, “Context-aware sparse deep coordination graphs,” in the 10th International Conference on Learning Representations (ICLR 2022), Virtual Event, 2022.
- A. Pacchiano, J. Parker-Holder, Y. Tang, K. Choromanski, A. Choromanska, and M. Jordan, “Learning to score behaviors for guided policy optimization,” in the 37th International Conference on Machine Learning, (ICML 2020), vol. 119, 13–18 Jul 2020, pp. 7445–7454.
- A. Oroojlooy and D. Hajinezhad, “A review of cooperative multi-agent deep reinforcement learning,” Appl. Intell., vol. 53, no. 11, pp. 13 677–13 722, 2023.
- R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in the 30th Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017, pp. 6379–6390.
- J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, Louisiana, USA, 2018, pp. 2974–2982.
- Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Networks Learn. Syst., vol. 32, no. 1, pp. 4–24, 2021.
- Y. Liu, W. Wang, Y. Hu, J. Hao, X. Chen, and Y. Gao, “Multi-agent game abstraction via graph attention neural network,” in the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), New York, NY, USA,, 2020, pp. 7211–7218.
- T. Wang, J. Wang, C. Zheng, and C. Zhang, “Learning nearly decomposable value functions via communication minimization,” in the 8th International Conference on Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 2020.
- W. Duan, J. Xuan, M. Qiao, and J. Lu, “Learning from the dark: Boosting graph convolutional neural networks with diverse negative samples,” in the 36th AAAI Conference on Artificial Intelligence (AAAI 2022), Virtual Event. AAAI Press, 2022, pp. 6550–6558.
- J. Jiang, C. Dun, T. Huang, and Z. Lu, “Graph convolutional reinforcement learning,” in 8th International Conference on Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 2020.
- S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” in the 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA, vol. 97, 2019, pp. 2961–2970.
- T. Wang, H. Dong, V. R. Lesser, and C. Zhang, “ROMA: multi-agent reinforcement learning with emergent roles,” in the 37th International Conference on Machine Learning (ICML 2020), Virtual Event, vol. 119, 2020, pp. 9876–9886.
- B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,” in the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, 2018, pp. 3634–3640.
- Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, and C. Zhang, “Connecting the dots: Multivariate time series forecasting with graph neural networks,” in The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2020), Virtual Event, CA, USA, 2020, pp. 753–763.
- V. G. Satorras, S. S. Rangapuram, and T. Januschowski, “Multivariate time series forecasting with latent graph inference,” CoRR, vol. abs/2203.03423, 2022.
- T. N. Kipf, E. Fetaya, K. Wang, M. Welling, and R. S. Zemel, “Neural relational inference for interacting systems,” in the 35th International Conference on Machine Learning (ICML 2018), Stockholmsmässan, Stockholm, Sweden, vol. 80. PMLR, 2018, pp. 2693–2702.
- L. Franceschi, M. Niepert, M. Pontil, and X. He, “Learning discrete structures for graph neural networks,” in the 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA, vol. 97, 2019, pp. 1972–1982.
- C. Shang, J. Chen, and J. Bi, “Discrete graph structure learning for forecasting multiple time series,” in the 9th International Conference on Learning Representations (ICLR 2021), Virtual Event, Austria, 2021.
- J. Li, C. Hua, J. Park, H. Ma, V. M. Dax, and M. J. Kochenderfer, “Evolvehypergraph: Group-aware dynamic relational reasoning for trajectory prediction,” CoRR, vol. abs/2208.05470, 2022.
- K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, no. 2, pp. 251–257, 1991.
- E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” in the 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 2017.
- C. J. Maddison, A. Mnih, and Y. W. Teh, “The concrete distribution: A continuous relaxation of discrete random variables,” in the 5th International Conference on Learning Representations (ICLR 2017),Toulon, France, 2017.
- Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,” in the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 2018.
- W. Duan, J. Lu, Y. G. Wang, and J. Xuan, “Layer-diverse negative sampling for graph neural networks,” Transactions on Machine Learning Research, 2024.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in the 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, April 24-26, 2017.
- M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. Hung, P. H. S. Torr, J. N. Foerster, and S. Whiteson, “The starcraft multi-agent challenge,” in the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2019), Montreal, QC, Canada,, 2019, pp. 2186–2188.
- C. Guestrin, S. Venkataraman, and D. Koller, “Context-specific multiagent coordination and planning with factored mdps,” in the 18th National Conference on Artificial Intelligence, (AAAI 2002), 2002, pp. 253–259.
- Wei Duan (18 papers)
- Jie Lu (127 papers)
- Junyu Xuan (21 papers)