2000 character limit reached
Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A Survey (2401.04934v1)
Published 10 Jan 2024 in cs.MA, cs.AI, and cs.LG
Abstract: Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks, but restrictions of real-world applications may require training the agents in a fully decentralized manner. Due to the lack of information about other agents, it is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting. Thus, this research area has not been thoroughly studied. In this paper, we seek to systematically review the fully decentralized methods in two settings: maximizing a shared reward of all agents and maximizing the sum of individual rewards of all agents, and discuss open questions and future research directions.
- Decentralized q-learning for stochastic teams and games. IEEE Transactions on Automatic Control, 62(4):1545–1558, 2016.
- Deep coordination graphs. In International Conference on Machine Learning (ICML), 2020.
- Decentralized policy gradient for nash equilibria learning of general-sum stochastic games. arXiv preprint arXiv:2210.07651, 2022.
- Breaking the curse of multiagents in a large state space: Rl in markov games with independent linear function approximation. In The Thirty Sixth Annual Conference on Learning Theory, pp. 2651–2652. PMLR, 2023.
- Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
- S Rasoul Etesami. Learning stationary nash equilibrium policies in n𝑛nitalic_n-player stochastic games with independent chains. arXiv preprint arXiv:2201.12224, 2022.
- Learning multi-agent intention-aware communication for optimal multi-order execution in finance. In The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023.
- Stabilising experience replay for deep multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2017.
- Counterfactual multi-agent policy gradients. In AAAI conference on artificial intelligence (AAAI), 2018.
- On the convergence of policy gradient methods to nash equilibria in general stochastic games. Advances in Neural Information Processing Systems, 35:7128–7141, 2022.
- Paul W Goldberg. A survey of ppad-completeness for computing nash equilibria. arXiv preprint arXiv:1103.2709, 2011.
- Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2019.
- I2q: A fully decentralized q-learning algorithm. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Offline decentralized multi-agent reinforcement learning. European Conference on Artificial Intelligence (ECAI), 2023a.
- Best possible q-learning. arXiv preprint arXiv:2302.01188, 2023b.
- Online tuning for offline decentralized multi-agent reinforcement learning. In AAAI Conference on Artificial Intelligence (AAAI), 2023c.
- V-learning–a simple, efficient, decentralized algorithm for multiagent rl. arXiv preprint arXiv:2110.14555, 2021.
- Iterated reasoning with mutual information in cooperative and byzantine decentralized teaming. In International Conference on Learning Representations (ICLR), 2021.
- Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations (ICLR), 2021.
- Google research football: A novel reinforcement learning environment. In The AAAI Conference on Artificial Intelligence (AAAI), 2020.
- An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In International Conference on Machine Learning (ICML), 2000.
- Multi-agent trust region policy optimization. arXiv preprint arXiv:2010.07916, 2020.
- Deep implicit coordination graphs for multi-agent reinforcement learning. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2021.
- Difference advantage estimation for multi-agent policy gradients. In International Conference on Machine Learning (ICML), 2022.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Neural Information Processing Systems (NeurIPS), 2017.
- Provably efficient reinforcement learning in decentralized general-sum markov games. Dynamic Games and Applications, 13(1):165–186, 2023.
- On improving model-free algorithms for decentralized multi-agent reinforcement learning. In International Conference on Machine Learning, pp. 15007–15049. PMLR, 2022.
- Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In International Conference on Intelligent Robots and Systems (IROS), 2007.
- Complexity of finite-horizon markov decision process problems. Journal of the ACM (JACM), 47(4):681–720, 2000.
- Multi-agent deep reinforcement learning for multi-robot applications: a survey. Sensors, 23(7):3625, 2023.
- Lenient multi-agent deep reinforcement learning. In International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2018.
- The complexity of markov decision processes. Mathematics of operations research, 12(3):441–450, 1987.
- Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2018.
- Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems (NeurIPS), 2020.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019a.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019b.
- Trust region policy optimization. In International Conference on Machine Learning (ICML), 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2019.
- Decentralized policy optimization. arXiv preprint arXiv:2211.03032, 2022a.
- Divergence-regularized multi-agent actor-critic. In International Conference on Machine Learning (ICML), 2022b.
- f𝑓fitalic_f-divergence policy optimization in fully decentralized cooperative marl, 2024.
- Ma2ql: A minimalist approach to fully decentralized multi-agent reinforcement learning. arXiv preprint arXiv:2209.08244, 2022.
- Value-decomposition networks for cooperative multi-agent learning based on team reward. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2018.
- Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In International Conference on Machine Learning (ICML), 1993.
- On the computational complexity of stochastic controller optimization in pomdps. ACM Transactions on Computation Theory (TOCT), 4(4):1–8, 2012.
- More centralized training, still decentralized execution: Multi-agent conditional policy factorization. In International Conference on Learning Representations (ICLR), 2023a.
- Mutual-information regularized multi-agent policy iteration. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
- Qplex: Duplex dueling multi-agent q-learning. In International Conference on Learning Representations (ICLR), 2021a.
- Multi-agent reinforcement learning for active voltage control on power distribution networks. Advances in Neural Information Processing Systems (NeurIPS), 2021b.
- Dop: Off-policy multi-agent decomposed policy gradients. In International Conference on Learning Representations (ICLR), 2020.
- Breaking the curse of multiagency: Provably efficient decentralized multi-agent rl with function approximation. arXiv preprint arXiv:2302.06606, 2023c.
- Q-learning. Machine learning, 8:279–292, 1992.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Hierarchically and cooperatively learning traffic signal control. In AAAI Conference on Artificial Intelligence (AAAI), 2021.
- Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020.
- Decentralized learning for optimality in stochastic dynamic teams and games with local control and global state information. IEEE Transactions on Automatic Control, 67(10):5230–5245, 2021.
- Asynchronous decentralized q-learning: Two timescale analysis by persistence. arXiv preprint arXiv:2308.03239, 2023.
- The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Coordinating multi-agent reinforcement learning with limited communication. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2013.
- Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning (ICML), 2018.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, pp. 321–384, 2021a.
- Gradient play in stochastic games: stationary points, convergence, and sample complexity. arXiv preprint arXiv:2106.00198, 2021b.
- Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2021c.