An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning (2409.03052v1)
Abstract: Multi-agent reinforcement learning (MARL) has exploded in popularity in recent years. Many approaches have been developed but they can be divided into three main types: centralized training and execution (CTE), centralized training for decentralized execution (CTDE), and Decentralized training and execution (DTE). CTDE methods are the most common as they can use centralized information during training but execute in a decentralized manner -- using only information available to that agent during execution. CTDE is the only paradigm that requires a separate training phase where any available information (e.g., other agent policies, underlying states) can be used. As a result, they can be more scalable than CTE methods, do not require communication during execution, and can often perform well. CTDE fits most naturally with the cooperative case, but can be potentially applied in competitive or mixed settings depending on what information is assumed to be observed. This text is an introduction to CTDE in cooperative MARL. It is meant to explain the setting, basic concepts, and common methods. It does not cover all work in CTDE MARL as the subarea is quite extensive. I have included work that I believe is important for understanding the main concepts in the subarea and apologize to those that I have omitted.
- Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2024. https://www.marl-book.com.
- C. Amato. An introduction to decentralized training and execution in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2405.06161, 2024.
- A. Baisero and C. Amato. Unbiased asymmetric reinforcement learning under partial observability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2022.
- Sample bounded distributed reinforcement learning for decentralized POMDPs. In Proceedings of the National Conference on Artificial Intelligence, 2012.
- The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819–840, 2002.
- C. Boutilier. Planning, learning and coordination in multiagent decision processes. In Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, 1996.
- A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2):156–172, Mar. 2008.
- C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, pages 746–752, 1998.
- Deep transformer Q-networks for partially observable reinforcement learning. arXiv preprint arXiv:2206.01078, 2022.
- Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 29, 2016.
- Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, pages 1146–1155, 2017.
- Learning with opponent-learning awareness. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2018a.
- Counterfactual multi-agent policy gradients. In Proceedings of the National Conference on Artificial Intelligence, 2018b.
- C. V. Goldman and S. Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of AI Research, 22:143–174, 2004.
- Multiagent Planning with Factored MDPs. In Advances in Neural Information Processing Systems, 2001.
- Cooperative multi-agent control using deep reinforcement learning. In Adaptive and Learning Agents Workshop at AAMAS, 2017.
- Hypernetworks. In Proceedings of the International Conference on Learning Representations, 2017.
- M. Hausknecht and P. Stone. Deep recurrent Q-learning for partially observable MDPs. arXiv preprint arXiv:1507.06527, 2015.
- Y.-C. Ho. Team decision theory and information structures. Proceedings of the IEEE, 68(6):644–654, 1980.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- S. Iqbal and F. Sha. Actor-attention-critic for multi-agent reinforcement learning. In International conference on machine learning, 2019.
- J. Jiang and Z. Lu. I2Q: A fully decentralized q-learning algorithm. In Advances in Neural Information Processing Systems, pages 20469–20481, 2022.
- J. Jiang and Z. Lu. Best possible Q-learning. arXiv preprint arXiv:2302.01188, 2023.
- Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.
- J. R. Kok and N. Vlassis. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7:1789–1828, 2006.
- L. Kraemer and B. Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016.
- Trust region policy optimisation in multi-agent reinforcement learning. In Proceedings of the International Conference on Learning Representations, 2022.
- M. Lauer and M. A. Riedmiller. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the International Conference on Machine Learning, 2000.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- From explicit communication to tacit cooperation: A novel paradigm for cooperative marl. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2023.
- Continuous control with deep reinforcement learning. In Proceedings of the International Conference on Learning Representations, 2016.
- Decentralized multi-agents by imitation of a centralized controller. In Mathematical and Scientific Machine Learning, pages 619–651, 2022.
- L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3-4):293–321, 1992.
- On the complexity of solving Markov decision problems. In Proceedings of Uncertainty in Artificial Intelligence, 1995.
- Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, 2017.
- Contrasting centralized and decentralized critics in multi-agent reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2021.
- A deeper understanding of state-based critics in multi-agent reinforcement learning. In Proceedings of the National Conference on Artificial Intelligence, 2022.
- On centralized critics in multi-agent reinforcement learning. Journal of AI Research, 77:235–294, 2023.
- On centralized critics in multi-agent reinforcement learning (updated version). arXiv preprint arXiv: 2408.14597, 2024.
- On stateful value factorization in multi-agent reinforcement learning. arXiv preprint arXiv: 2408.15381, 2024.
- J. Marschak. Elements for a theory of teams. Management Science, 1:127–137, 1955.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
- K. P. Murphy. A survey of POMDP solution techniques. Technical report, University of British Columbia, 2000.
- Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 705–711, 2003.
- When do transformers shine in RL? decoupling memory from credit assignment. In Advances in Neural Information Processing Systems, 2023.
- C. Nota and P. S. Thomas. Is the policy gradient a gradient? In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2020.
- F. A. Oliehoek and C. Amato. A Concise Introduction to Decentralized POMDPs. Springer, 2016.
- Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the International Conference on Machine Learning, 2017.
- The complexity of Markov decision processes. Mathematics of operations research, 12(3):441–450, 1987.
- Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 2021.
- Learning to cooperate via policy search. In Proceedings of Uncertainty in Artificial Intelligence, 2000.
- Eligibility traces for off-policy policy evaluation. In Proceedings of the International Conference on Machine Learning, 2000.
- M. L. Puterman. Markov Decision Processes—Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994.
- R. Radner. Team decision problems. Annals of Mathematical Statistics, 33:857–881, 1962.
- QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2018.
- Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, volume 33, 2020a.
- Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research, 21(1):7234–7284, 2020b.
- The StarCraft Multi-Agent Challenge. arXiv preprint arXiv:1902.04043, 2019.
- Distributed value functions. In Proceedings of the International Conference on Machine Learning, 1999.
- Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, 2014.
- QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2019.
- MA2QL: A minimalist approach to fully decentralized multi-agent reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2024.
- Value-decomposition networks for cooperative multi-agent learning. arXiv:1706.05296, 2017.
- Reinforcement Learning: An Introduction (second edition). The MIT Press, 2018.
- Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4), 2017.
- Revisiting parameter sharing in multi-agent deep reinforcement learning. arXiv preprint arXiv:2005.13625, 2020.
- QPLEX: Duplex dueling multi-agent Q-learning. In Proceedings of the International Conference on Learning Representations, 2021a.
- Learning nearly decomposable value functions via communication minimization. In Proceedings of the International Conference on Learning Representations, 2020.
- DOP: Off-policy multi-agent decomposed policy gradients. In Proceedings of the International Conference on Learning Representations, 2021b.
- Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2016.
- Q-learning. Machine Learning, 8(3):279–292, May 1992.
- COLA: Consistent learning with opponent-learning awareness. In Proceedings of the International Conference on Machine Learning, 2022.
- Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020.
- The surprising effectiveness of PPO in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35, 2022.
- FOP: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2021.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run custom paper prompts using GPT-5 on this paper.