Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism (2301.01919v2)
Abstract: Communication can impressively improve cooperation in multi-agent reinforcement learning (MARL), especially for partially-observed tasks. However, existing works either broadcast the messages leading to information redundancy, or learn targeted communication by modeling all the other agents as targets, which is not scalable when the number of agents varies. In this work, to tackle the scalability problem of MARL communication for partially-observed tasks, we propose a novel framework Transformer-based Email Mechanism (TEM). The agents adopt local communication to send messages only to the ones that can be observed without modeling all the agents. Inspired by human cooperation with email forwarding, we design message chains to forward information to cooperate with the agents outside the observation range. We introduce Transformer to encode and decode the message chain to choose the next receiver selectively. Empirically, TEM outperforms the baselines on multiple cooperative MARL benchmarks. When the number of agents varies, TEM maintains superior performance without further training.
- TarMAC: Targeted Multi-Agent Communication. In Proceedings of the 36th International Conference on Machine Learning, pages 1538–1546. PMLR, May 2019.
- Learning Individually Inferred Communication for Multi-Agent Cooperation. In Advances in Neural Information Processing Systems, volume 33, pages 22069–22079, 2020.
- Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 29, 2016.
- Learning Attentional Communication for Multi-Agent Cooperation. November 2018. arXiv: 1805.07733.
- Learning to Schedule Communication in Multi-agent Reinforcement Learning. February 2019. arXiv: 1902.01554.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, September 2013.
- Continuous control with deep reinforcement learning. July 2019. arXiv: 1509.02971.
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. March 2020. arXiv:1706.02275.
- Learning Agent Communication under Limited Bandwidth by Message Pruning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):5142–5149, April 2020.
- Dota 2 with Large Scale Deep Reinforcement Learning, December 2019. arXiv:1912.06680.
- Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games, September 2017. arXiv:1703.10069.
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June 2018. arXiv:1803.11485.
- The StarCraft Multi-Agent Challenge, December 2019. arXiv:1902.04043.
- Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning, pages 1889–1897. PMLR, June 2015.
- Proximal Policy Optimization Algorithms, August 2017. arXiv:1707.06347.
- Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, October 2016. arXiv:1610.03295.
- Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks, December 2018. arXiv:1812.09755.
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, pages 5887–5896. PMLR, May 2019.
- Learning Multiagent Communication with Backpropagation. In Advances in Neural Information Processing Systems, volume 29, 2016.
- Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, November 2019.
- Learning Efficient Multi-agent Communication: An Information Bottleneck Approach. In Proceedings of the 37th International Conference on Machine Learning, pages 9908–9918. PMLR, November 2020. ISSN: 2640-3498.
- CoLight: Learning Network-level Cooperation for Traffic Signal Control. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1913–1922, November 2019.
- Multi-Agent Reinforcement Learning is a Sequence Modeling Problem, May 2022. arXiv:2205.14953.
- The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games, July 2021. arXiv:2103.01955.
- Multi-Agent Incentive Communication via Decentralized Teammate Modeling. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):9466–9474, June 2022.
- Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control. In Advances in Neural Information Processing Systems, volume 32, 2019.
- Succinct and Robust Multi-Agent Communication With Temporal Message Control. In Advances in Neural Information Processing Systems, volume 33, pages 17271–17282, 2020.
- Xudong Guo (7 papers)
- Daming Shi (5 papers)
- Wenhui Fan (9 papers)