Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning (2308.03358v5)

Published 7 Aug 2023 in cs.AI

Abstract: Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in partially observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents, leading to the generation of continuous messages with high communication overhead and poor interpretability. Prior attempts at discrete communication methods generate one-hot vectors trained as part of agents' actions and use the Gumbel softmax operation for calculating message gradients, which are all heuristic designs that do not provide any quantitative guarantees on the expected return. This paper establishes an upper bound on the return gap between an ideal policy with full observability and an optimal partially observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. To minimize the return gap, we propose the Return-Gap-Minimization Communication (RGMComm) algorithm, which is a surprisingly simple design of discrete message generation functions and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss function, which incorporates cosine-distance as the clustering metric. Evaluations show that RGMComm significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly optimal returns with few-bit messages that are naturally interpretable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. The complexity of decentralized control of Markov decision processes. Mathematics of operations research, 27(4): 819–840.
  2. An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Transactions on Industrial Informatics, 9(1): 427–438.
  3. RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning. arXiv:2308.03358.
  4. TarMAC: Targeted Multi-Agent Communication.
  5. Fast inverse-free sparse Bayesian learning via relaxed evidence lower bound maximization. IEEE Signal Processing Letters, 24(6): 774–778.
  6. Learning to Communicate with Deep Multi-Agent Reinforcement Learning.
  7. Sparse discrete communication learning for multi-agent cooperation through backpropagation. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 7993–7998. IEEE.
  8. Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. Advances in neural information processing systems, 30.
  9. Elbo surgery: yet another way to carve up the variational evidence lower bound. In Workshop in Advances in Approximate Bayesian Inference, NIPS, volume 1.
  10. Learning discrete representations via information maximizing self-augmented training. In International conference on machine learning, 1558–1567. PMLR.
  11. Deep Embedding Network for Clustering. In 2014 22nd International Conference on Pattern Recognition, 1532–1537.
  12. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
  13. Learning Attentional Communication for Multi-Agent Cooperation.
  14. Emergent multi-agent communication in the deep learning era. arXiv preprint arXiv:2006.02419.
  15. Emergent Specialization in Swarm Systems. In Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning, IDEAL ’02, 261–266. Berlin, Heidelberg: Springer-Verlag. ISBN 3540440259.
  16. Learning Emergent Discrete Message Communication for Cooperative Reinforcement Learning. In 2022 International Conference on Robotics and Automation (ICRA), 5511–5517.
  17. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.
  18. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712.
  19. Emergence of Grounded Compositional Language in Multi-Agent Populations.
  20. Document clustering using concept space and cosine similarity measurement. In 2009 International Conference on Computer Technology and Development, volume 1, 58–62. IEEE.
  21. Learning Multi-Agent Communication through Structured Attentive Reasoning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781713829546.
  22. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning, 4295–4304. PMLR.
  23. Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks.
  24. Learning Multiagent Communication with Backpropagation.
  25. Trading off utility, informativeness, and complexity in emergent communication. Advances in neural information processing systems, 35: 22214–22228.
  26. Emergent discrete communication in semantic spaces. Advances in Neural Information Processing Systems, 34: 10574–10586.
  27. Learning nearly decomposable value functions via communication minimization. arXiv preprint arXiv:1910.05366.
  28. Dop: Off-policy multi-agent decomposed policy gradients. In International Conference on Learning Representations.
  29. Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning, 12491–12500. PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jingdi Chen (7 papers)
  2. Tian Lan (162 papers)
  3. Carlee Joe-Wong (69 papers)
Citations (13)