Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability (2301.01649v6)
Abstract: Stochastic partial observability poses a major challenge for decentralized coordination in multi-agent reinforcement learning but is largely neglected in state-of-the-art research due to a strong focus on state-based centralized training for decentralized execution (CTDE) and benchmarks that lack sufficient stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under stochastic partial observability. AERIAL replaces the true state with a learned representation of multi-agent recurrence, considering more accurate information about decentralized agent decisions than state-based CTDE. We then introduce MessySMAC, a modified version of SMAC with stochastic observations and higher variance in initial states, to provide a more general and configurable benchmark regarding stochastic partial observability. We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and MessySMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity configurations in MessySMAC.
- Optimizing Memory-Bounded Controllers for Decentralized POMDPs. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, pp. 1–8, 2007.
- Bounded Policy Iteration for Decentralized POMDPs. In IJCAI, pp. 52–57, 2005.
- Boutilier, C. Planning, Learning and Coordination in Multiagent Decision Processes. In Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pp. 195–210. Morgan Kaufmann Publishers Inc., 1996.
- Decision Transformer: Reinforcement Learning via Sequence Modeling. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 15084–15097. Curran Associates, Inc., 2021.
- On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111, 2014.
- SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. 2022.
- Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS ’04, pp. 136–143, USA, 2004. IEEE Computer Society. ISBN 1581138644.
- Counterfactual Multi-Agent Policy Gradients. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018.
- Cooperative Multi-Agent Control using Deep Reinforcement Learning. In Autonomous Agents and Multiagent Systems, pp. 66–83. Springer, 2017.
- Deep Recurrent Q-Learning for Partially Observable MDPs. In 2015 AAAI Fall Symposium Series, 2015.
- Long Short-Term Memory. Neural Computation, 9(8):1735–1780, 1997.
- Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning. In International Conference on Learning Representations, 2019.
- Actor-Attention-Critic for Multi-Agent Reinforcement Learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2961–2970, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
- Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 4596–4606. PMLR, 18–24 Jul 2021.
- Planning and Acting in Partially Observable Stochastic Domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Transformer-Based Value Function Decomposition for Cooperative Multi-Agent Reinforcement Learning in StarCraft. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 18(1):113–119, Oct. 2022.
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning. In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, pp. 844–852, 2021.
- A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):9396–9404, Jun. 2022. doi: 10.1609/aaai.v36i9.21171.
- Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, pp. 705–711, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc.
- A Concise Introduction to Decentralized POMDPs, volume 1. Springer, 2016.
- Optimal and Approximate Q-Value Functions for Decentralized POMDPs. Journal of Artificial Intelligence Research, 32:289–353, 2008.
- Learning and Testing Resilience in Cooperative Multi-Agent Systems. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, pp. 1055–1063. International Foundation for Autonomous Agents and Multiagent Systems, 2020.
- VAST: Value Function Factorization with Variable Agent Sub-Teams. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 24018–24032. Curran Associates, Inc., 2021.
- Attention-Based Recurrency for Multi-Agent Reinforcement Learning under State Uncertainty. In Extended Abstracts of the 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, pp. 2839–2841. International Foundation for Autonomous Agents and Multiagent Systems, 2023. ISBN 9781450394321.
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 4295–4304. PMLR, 10–15 Jul 2018.
- Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 10199–10210. Curran Associates, Inc., 2020.
- The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’19, pp. 2186–2188, Richland, SC, 2019. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450363099.
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 5887–5896. PMLR, 09–15 Jun 2019.
- Value-Decomposition Networks for Cooperative Multi-Agent Learning based on Team Reward. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’18, pp. 2085–2087, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems.
- MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs. UAI’05, pp. 576–583, Arlington, Virginia, USA, 2005. AUAI Press. ISBN 0974903914.
- Tan, M. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents. In Proceedings of the Tenth International Conference on International Conference on Machine Learning, ICML’93, pp. 330–337, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc. ISBN 1558603077.
- Attention is All You Need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Grandmaster Level in StarCraft II using Multi-Agent Reinforcement Learning. Nature, pp. 1–5, 2019.
- QPLEX: Duplex Dueling Multi-Agent Q-Learning. In International Conference on Learning Representations, 2021.
- Q-Learning. Machine Learning, 8(3-4):279–292, 1992.
- Multi-Agent Reinforcement Learning is a Sequence Modeling Problem. arXiv preprint arXiv:2205.14953, 2022.
- The Societal Implications of Deep Reinforcement Learning. Journal of Artificial Intelligence Research, 70:1003–1030, May 2021. ISSN 1076-9757. doi: 10.1613/jair.1.12360.
- The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games. In 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
- Thomy Phan (29 papers)
- Fabian Ritz (18 papers)
- Philipp Altmann (32 papers)
- Maximilian Zorn (23 papers)
- Jonas Nüßlein (33 papers)
- Michael Kölle (45 papers)
- Thomas Gabor (56 papers)
- Claudia Linnhoff-Popien (105 papers)