Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AgentMixer: Multi-Agent Correlated Policy Factorization (2401.08728v3)

Published 16 Jan 2024 in cs.MA and cs.AI

Abstract: In multi-agent reinforcement learning, centralized training with decentralized execution (CTDE) methods typically assume that agents make decisions based on their local observations independently, which may not lead to a correlated joint policy with coordination. Coordination can be explicitly encouraged during training and individual policies can be trained to imitate the correlated joint policy. However, this may lead to an \textit{asymmetric learning failure} due to the observation mismatch between the joint and individual policies. Inspired by the concept of correlated equilibrium, we introduce a \textit{strategy modification} called AgentMixer that allows agents to correlate their policies. AgentMixer combines individual partially observable policies into a joint fully observable policy non-linearly. To enable decentralized execution, we introduce \textit{Individual-Global-Consistency} to guarantee mode consistency during joint training of the centralized and decentralized policies and prove that AgentMixer converges to an $\epsilon$-approximate Correlated Equilibrium. In the Multi-Agent MuJoCo, SMAC-v2, Matrix Game, and Predator-Prey benchmarks, AgentMixer outperforms or matches state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Deep coordination graphs. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  980–991. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/boehmer20a.html.
  2. Time Series Analysis: Forecasting and Control. Wiley Series in Probability and Statistics. Wiley, 2015. ISBN 9781118674925. URL https://books.google.fi/books?id=rNt5CgAAQBAJ.
  3. Signal instructed coordination in cooperative multi-agent reinforcement learning. In Jie Chen, Jérôme Lang, Christopher Amato, and Dengji Zhao (eds.), Distributed Artificial Intelligence, pp.  185–205, Cham, 2022. Springer International Publishing. ISBN 978-3-030-94662-3.
  4. Is independent learning all you need in the starcraft multi-agent challenge? CoRR, abs/2011.09533, 2020. URL https://arxiv.org/abs/2011.09533.
  5. Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning, 2022.
  6. E.A. Feinberg and A. Shwartz. Handbook of Markov Decision Processes: Methods and Applications. International Series in Operations Research & Management Science. Springer US, 2012. ISBN 9781461508052. URL https://books.google.fi/books?id=TpwKCAAAQBAJ.
  7. Revisiting some common practices in cooperative multi-agent reinforcement learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  6863–6877. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/fu22d.html.
  8. Addressing function approximation error in actor-critic methods. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  1587–1596. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/fujimoto18a.html.
  9. Correlated-q learning. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, pp.  242–249. AAAI Press, 2003. ISBN 1577351894.
  10. Safe multi-agent reinforcement learning for multi-robot control. Artificial Intelligence, 319:103905, 2023. ISSN 0004-3702. doi: https://doi.org/10.1016/j.artint.2023.103905. URL https://www.sciencedirect.com/science/article/pii/S0004370223000516.
  11. Soft actor-critic algorithms and applications, 2019.
  12. Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res., 4(null):1039–1069, dec 2003. ISSN 1532-4435.
  13. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkE3y85ee.
  14. A maximum mutual information framework for multi-agent reinforcement learning, 2020.
  15. Trust region policy optimisation in multi-agent reinforcement learning. CoRR, abs/2109.11251, 2021. URL https://arxiv.org/abs/2109.11251.
  16. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, pp.  535–542, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. ISBN 1558607072.
  17. Ace: Cooperative multi-agent q-learning with bidirectional action-dependency. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7):8536–8544, Jun. 2023. doi: 10.1609/aaai.v37i7.26028. URL https://ojs.aaai.org/index.php/AAAI/article/view/26028.
  18. Deep implicit coordination graphs for multi-agent reinforcement learning. arXiv preprint arXiv:2006.11438, 2020.
  19. Decentralized multi-agents by imitation of a centralized controller. In Joan Bruna, Jan Hesthaven, and Lenka Zdeborova (eds.), Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, volume 145 of Proceedings of Machine Learning Research, pp.  619–651. PMLR, 16–19 Aug 2022. URL https://proceedings.mlr.press/v145/lin22a.html.
  20. Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In William W. Cohen and Haym Hirsh (eds.), Machine Learning Proceedings 1994, pp.  157–163. Morgan Kaufmann, San Francisco (CA), 1994. ISBN 978-1-55860-335-6. doi: https://doi.org/10.1016/B978-1-55860-335-6.50027-1. URL https://www.sciencedirect.com/science/article/pii/B9781558603356500271.
  21. Multi-agent actor-critic for mixed cooperative-competitive environments. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/68a9750337a418a86fe06c1991a1d64c-Paper.pdf.
  22. Feudal multi-agent deep reinforcement learning for traffic signal control. In Amal El Fallah Seghrouchni, Gita Sukthankar, Bo An, and Neil Yorke-Smith (eds.), Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020, pp.  816–824. International Foundation for Autonomous Agents and Multiagent Systems, 2020. doi: 10.5555/3398761.3398858. URL https://dl.acm.org/doi/10.5555/3398761.3398858.
  23. Maven: Multi-agent variational exploration. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/f816dc0acface7498e10496222e9db10-Paper.pdf.
  24. Game Theory. Cambridge University Press, 2013. ISBN 9781107005488. URL https://books.google.fi/books?id=lqwzqgvhwXsC.
  25. A Concise Introduction to Decentralized POMDPs. Springer Publishing Company, Incorporated, 1st edition, 2016. ISBN 3319289276.
  26. Facmac: Factored multi-agent centralised policy gradients. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  12208–12221. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/65b9eea6e1cc6bb9f0cd2a47751a186f-Paper.pdf.
  27. Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  10199–10210. Curran Associates, Inc., 2020a. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/73a427badebe0e32caa2e1fc7530b7f3-Paper.pdf.
  28. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res., 21(1), jan 2020b. ISSN 1532-4435.
  29. Efficient reductions for imitation learning. In Yee Whye Teh and Mike Titterington (eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pp.  661–668, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URL https://proceedings.mlr.press/v9/ross10a.html.
  30. A reduction of imitation learning and structured prediction to no-regret online learning. In Geoffrey Gordon, David Dunson, and Miroslav Dudík (eds.), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pp.  627–635, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR. URL https://proceedings.mlr.press/v15/ross11a.html.
  31. Larry Samuelson. Evolutionary games and equilibrium selection, volume 1. MIT press, 1997.
  32. Multi-agent common knowledge reinforcement learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/f968fdc88852a4a3a27a81fe3f57bfc5-Paper.pdf.
  33. Trust region policy optimization. In Francis Bach and David Blei (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp.  1889–1897, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/schulman15.html.
  34. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017a. URL http://arxiv.org/abs/1707.06347.
  35. Proximal policy optimization algorithms, 2017b.
  36. Safe, multi-agent, reinforcement learning for autonomous driving. CoRR, abs/1610.03295, 2016. URL http://arxiv.org/abs/1610.03295.
  37. Negotiated reasoning: On provably addressing relative over-generalization, 2023.
  38. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  5887–5896. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/son19a.html.
  39. Value-decomposition multi-agent actor-critics. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13):11352–11360, May 2021. doi: 10.1609/aaai.v35i13.17353. URL https://ojs.aaai.org/index.php/AAAI/article/view/17353.
  40. Value-decomposition networks for cooperative multi-agent learning, 2017.
  41. Mlp-mixer: An all-mlp architecture for vision. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  24261–24272. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/cba0a4ee5ccd02fda0fe3f9a3e7b89fe-Paper.pdf.
  42. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  43. Impossibly good experts and how to follow them. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=sciA_xgYofB.
  44. More centralized training, still decentralized execution: Multi-agent conditional policy factorization. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=znLlSgN-4S0.
  45. {QPLEX}: Duplex dueling multi-agent q-learning. In International Conference on Learning Representations, 2021a. URL https://openreview.net/forum?id=Rcmk0xxIQV.
  46. {DOP}: Off-policy multi-agent decomposed policy gradients. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=6FqKiVAdI3Y.
  47. Robust asymmetric learning in pomdps. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  11013–11023. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/warrington21a.html.
  48. Multi-agent reinforcement learning is a sequence modeling problem. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  16509–16521. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/69413f87e5a34897cd010ca698097d0a-Paper-Conference.pdf.
  49. Probabilistic recursive reasoning for multi-agent reinforcement learning. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rkl6As0cF7.
  50. Towards global optimality in cooperative marl with sequential transformation. arXiv preprint arXiv:2207.11143, 2022.
  51. The surprising effectiveness of ppo in cooperative multi-agent games. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  24611–24624. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/9c1535a02f0ce079433344e14d910597-Paper-Datasets_and_Benchmarks.pdf.
  52. Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  12491–12500. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/zhang21m.html.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhiyuan Li (304 papers)
  2. Wenshuai Zhao (14 papers)
  3. Lijun Wu (113 papers)
  4. Joni Pajarinen (68 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets