MADiff: Offline Multi-agent Learning with Diffusion Models (2305.17330v4)
Abstract: Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning, where the diffusion planner learn to generate desired trajectories during online evaluations. However, despite the effectiveness in single-agent learning, it remains unclear how DMs can operate in multi-agent problems, where agents can hardly complete teamwork without good coordination by independently modeling each agent's trajectories. In this paper, we propose MADiff, a novel generative multi-agent learning framework to tackle this problem. MADiff is realized with an attention-based diffusion model to model the complex coordination among behaviors of multiple agents. To the best of our knowledge, MADiff is the first diffusion-based multi-agent learning framework, which behaves as both a decentralized policy and a centralized controller. During decentralized executions, MADiff simultaneously performs teammate modeling, and the centralized controller can also be applied in multi-agent trajectory predictions. Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks, which emphasizes the effectiveness of MADiff in modeling complex multi-agent interactions. Our code is available at https://github.com/zbzhu99/madiff.
- Reducing overestimation bias in multi-agent domains using double centralized critics. arXiv preprint arXiv:1910.01465, 2019.
- Is conditional generative modeling all you need for decision-making? International Conference on Learning Representations, 2023.
- baller2vec++: A look-ahead multi-entity transformer for modeling coordinated agents. arXiv preprint arXiv:2104.11980, 2021.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2212.07489, 2022.
- Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326, 2017.
- Off-the-grid marl: a framework for dataset generation with baselines for cooperative offline multi-agent reinforcement learning. arXiv preprint arXiv:2302.00521, 2023.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pages 2052–2062. PMLR, 2019.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, pages 9902–9915. PMLR, 2022.
- Offline decentralized multi-agent reinforcement learning. arXiv preprint arXiv:2108.01832, 2021.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
- Evolvegraph: Multi-agent trajectory prediction with dynamic relational reasoning. Advances in neural information processing systems, 33:19783–19794, 2020.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Neural Information Processing Systems (NIPS), 2017.
- Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021a.
- Offline pre-trained multi-agent decision transformer: One big sequence model tackles all smac tasks. arXiv e-prints, pages arXiv–2112, 2021b.
- A concise introduction to decentralized POMDPs. Springer, 2016.
- Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification. In International Conference on Machine Learning, pages 17221–17237. PMLR, 2022.
- Agent modelling under partial observability for deep reinforcement learning. Advances in Neural Information Processing Systems, 34:19210–19222, 2021.
- Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34:12208–12221, 2021.
- Machine theory of mind. In International conference on machine learning, pages 4218–4227. PMLR, 2018.
- Modeling others using oneself in multi-agent reinforcement learning. In International conference on machine learning, pages 4257–4266. PMLR, 2018.
- Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1):7234–7284, 2020.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, 2022.
- Multi-agent reinforcement learning is a sequence modeling problem. Advances in Neural Information Processing Systems, 35:16509–16521, 2022.
- Cola: consistent learning with opponent-learning awareness. In International Conference on Machine Learning, pages 23804–23831. PMLR, 2022.
- Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
- Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:10299–10312, 2021.
- Proximal learning with opponent-learning awareness. Advances in Neural Information Processing Systems, 35:26324–26336, 2022.
- Zhengbang Zhu (12 papers)
- Minghuan Liu (29 papers)
- Liyuan Mao (8 papers)
- Bingyi Kang (39 papers)
- Minkai Xu (40 papers)
- Yong Yu (219 papers)
- Stefano Ermon (279 papers)
- Weinan Zhang (322 papers)