Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A Survey (2401.04934v1)

Published 10 Jan 2024 in cs.MA, cs.AI, and cs.LG

Abstract: Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks, but restrictions of real-world applications may require training the agents in a fully decentralized manner. Due to the lack of information about other agents, it is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting. Thus, this research area has not been thoroughly studied. In this paper, we seek to systematically review the fully decentralized methods in two settings: maximizing a shared reward of all agents and maximizing the sum of individual rewards of all agents, and discuss open questions and future research directions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Decentralized q-learning for stochastic teams and games. IEEE Transactions on Automatic Control, 62(4):1545–1558, 2016.
  2. Deep coordination graphs. In International Conference on Machine Learning (ICML), 2020.
  3. Decentralized policy gradient for nash equilibria learning of general-sum stochastic games. arXiv preprint arXiv:2210.07651, 2022.
  4. Breaking the curse of multiagents in a large state space: Rl in markov games with independent linear function approximation. In The Thirty Sixth Annual Conference on Learning Theory, pp.  2651–2652. PMLR, 2023.
  5. Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
  6. S Rasoul Etesami. Learning stationary nash equilibrium policies in n𝑛nitalic_n-player stochastic games with independent chains. arXiv preprint arXiv:2201.12224, 2022.
  7. Learning multi-agent intention-aware communication for optimal multi-order execution in finance. In The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023.
  8. Stabilising experience replay for deep multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2017.
  9. Counterfactual multi-agent policy gradients. In AAAI conference on artificial intelligence (AAAI), 2018.
  10. On the convergence of policy gradient methods to nash equilibria in general stochastic games. Advances in Neural Information Processing Systems, 35:7128–7141, 2022.
  11. Paul W Goldberg. A survey of ppad-completeness for computing nash equilibria. arXiv preprint arXiv:1103.2709, 2011.
  12. Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2019.
  13. I2q: A fully decentralized q-learning algorithm. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  14. Offline decentralized multi-agent reinforcement learning. European Conference on Artificial Intelligence (ECAI), 2023a.
  15. Best possible q-learning. arXiv preprint arXiv:2302.01188, 2023b.
  16. Online tuning for offline decentralized multi-agent reinforcement learning. In AAAI Conference on Artificial Intelligence (AAAI), 2023c.
  17. V-learning–a simple, efficient, decentralized algorithm for multiagent rl. arXiv preprint arXiv:2110.14555, 2021.
  18. Iterated reasoning with mutual information in cooperative and byzantine decentralized teaming. In International Conference on Learning Representations (ICLR), 2021.
  19. Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations (ICLR), 2021.
  20. Google research football: A novel reinforcement learning environment. In The AAAI Conference on Artificial Intelligence (AAAI), 2020.
  21. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In International Conference on Machine Learning (ICML), 2000.
  22. Multi-agent trust region policy optimization. arXiv preprint arXiv:2010.07916, 2020.
  23. Deep implicit coordination graphs for multi-agent reinforcement learning. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2021.
  24. Difference advantage estimation for multi-agent policy gradients. In International Conference on Machine Learning (ICML), 2022.
  25. Multi-agent actor-critic for mixed cooperative-competitive environments. Neural Information Processing Systems (NeurIPS), 2017.
  26. Provably efficient reinforcement learning in decentralized general-sum markov games. Dynamic Games and Applications, 13(1):165–186, 2023.
  27. On improving model-free algorithms for decentralized multi-agent reinforcement learning. In International Conference on Machine Learning, pp.  15007–15049. PMLR, 2022.
  28. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In International Conference on Intelligent Robots and Systems (IROS), 2007.
  29. Complexity of finite-horizon markov decision process problems. Journal of the ACM (JACM), 47(4):681–720, 2000.
  30. Multi-agent deep reinforcement learning for multi-robot applications: a survey. Sensors, 23(7):3625, 2023.
  31. Lenient multi-agent deep reinforcement learning. In International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2018.
  32. The complexity of markov decision processes. Mathematics of operations research, 12(3):441–450, 1987.
  33. Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems (NeurIPS), 2021.
  34. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2018.
  35. Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  36. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019a.
  37. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019b.
  38. Trust region policy optimization. In International Conference on Machine Learning (ICML), 2015.
  39. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  40. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2019.
  41. Decentralized policy optimization. arXiv preprint arXiv:2211.03032, 2022a.
  42. Divergence-regularized multi-agent actor-critic. In International Conference on Machine Learning (ICML), 2022b.
  43. f𝑓fitalic_f-divergence policy optimization in fully decentralized cooperative marl, 2024.
  44. Ma2ql: A minimalist approach to fully decentralized multi-agent reinforcement learning. arXiv preprint arXiv:2209.08244, 2022.
  45. Value-decomposition networks for cooperative multi-agent learning based on team reward. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2018.
  46. Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In International Conference on Machine Learning (ICML), 1993.
  47. On the computational complexity of stochastic controller optimization in pomdps. ACM Transactions on Computation Theory (TOCT), 4(4):1–8, 2012.
  48. More centralized training, still decentralized execution: Multi-agent conditional policy factorization. In International Conference on Learning Representations (ICLR), 2023a.
  49. Mutual-information regularized multi-agent policy iteration. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
  50. Qplex: Duplex dueling multi-agent q-learning. In International Conference on Learning Representations (ICLR), 2021a.
  51. Multi-agent reinforcement learning for active voltage control on power distribution networks. Advances in Neural Information Processing Systems (NeurIPS), 2021b.
  52. Dop: Off-policy multi-agent decomposed policy gradients. In International Conference on Learning Representations (ICLR), 2020.
  53. Breaking the curse of multiagency: Provably efficient decentralized multi-agent rl with function approximation. arXiv preprint arXiv:2302.06606, 2023c.
  54. Q-learning. Machine learning, 8:279–292, 1992.
  55. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  56. Hierarchically and cooperatively learning traffic signal control. In AAAI Conference on Artificial Intelligence (AAAI), 2021.
  57. Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020.
  58. Decentralized learning for optimality in stochastic dynamic teams and games with local control and global state information. IEEE Transactions on Automatic Control, 67(10):5230–5245, 2021.
  59. Asynchronous decentralized q-learning: Two timescale analysis by persistence. arXiv preprint arXiv:2308.03239, 2023.
  60. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  61. Coordinating multi-agent reinforcement learning with limited communication. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2013.
  62. Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning (ICML), 2018.
  63. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, pp.  321–384, 2021a.
  64. Gradient play in stochastic games: stationary points, convergence, and sample complexity. arXiv preprint arXiv:2106.00198, 2021b.
  65. Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), 2021c.
Citations (1)

Summary

We haven't generated a summary for this paper yet.