Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

N-Agent Ad Hoc Teamwork (2404.10740v3)

Published 16 Apr 2024 in cs.AI

Abstract: Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls $\textit{all}$ agents in the scenario, while in ad hoc teamwork, the learning algorithm usually assumes control over only a $\textit{single}$ agent in the scenario. However, many cooperative settings in the real world are much less restrictive. For example, in an autonomous driving scenario, a company might train its cars with the same learning algorithm, yet once on the road, these cars must cooperate with cars from another company. Towards expanding the class of scenarios that cooperative learning methods may optimally address, we introduce $N$-agent ad hoc teamwork (NAHT), where a set of autonomous agents must interact and cooperate with dynamically varying numbers and types of teammates. This paper formalizes the problem, and proposes the Policy Optimization with Agent Modelling (POAM) algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors. Empirical evaluation on tasks from the multi-agent particle environment and StarCraft II shows that POAM improves cooperative task returns compared to baseline approaches, and enables out-of-distribution generalization to unseen teammates.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Melting pot 2.0. arXiv preprint arXiv:2211.13746, 2022.
  2. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258:66–95, 2018.
  3. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2024. URL https://www.marl-book.com.
  4. Emergent Tool Use From Multi-Agent Autocurricula. In International Conference on Learning Representations, September 2019. URL https://openreview.net/forum?id=SkxpxJBKwS.
  5. An analysis framework for ad hoc teamwork tasks. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pp.  357–364, 2012.
  6. The Complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research, 27(4):819–840, November 2002. ISSN 0364-765X. doi: 10.1287/moor.27.4.819.297. URL https://pubsonline.informs.org/doi/10.1287/moor.27.4.819.297.
  7. Scaling multi-agent reinforcement learning with selective parameter sharing. In International Conference on Machine Learning, pp.  1989–1998. PMLR, 2021.
  8. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, AAAI ’98/IAAI ’98, pp.  746–752, USA, July 1998. American Association for Artificial Intelligence. ISBN 978-0-262-51098-1.
  9. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  10. Interpretable goal recognition in the presence of occluded factors for autonomous vehicles. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.  7044–7051. IEEE, 2021.
  11. A Survey and Critique of Multiagent Deep Reinforcement Learning. Autonomous Agents and Multi-Agent Systems, 33(6):750–797, November 2019. ISSN 1387-2532, 1573-7454. doi: 10.1007/s10458-019-09421-1. URL http://arxiv.org/abs/1810.05587. arXiv:1810.05587 [cs].
  12. “other-play” for zero-shot coordination. In International Conference on Machine Learning, pp.  4399–4410. PMLR, 2020.
  13. Scalable evaluation of multi-agent reinforcement learning with melting pot. In International conference on machine learning, pp.  6187–6199. PMLR, 2021.
  14. TiZero: Mastering Multi-Agent Football with Curriculum Learning and Self-Play. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, pp.  67–76, Richland, SC, May 2023. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 978-1-4503-9432-1.
  15. Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pp.  157–163. Elsevier, 1994.
  16. Trajectory diversity for zero-shot coordination. In International conference on machine learning, pp.  7204–7213. PMLR, 2021.
  17. Evaluating ad hoc teamwork performance in drop-in player challenges. In Gita Sukthankar and Juan A. Rodriguez-Aguilar (eds.), Autonomous Agents and Multiagent Systems, AAMAS 2017 Workshops, Best Papers, pp.  168–186. Springer International Publishing, 2017. URL http://nn.cs.utexas.edu/?LNAI17-MacAlpine.
  18. A survey of ad hoc teamwork research. In European Conference on Multi-Agent Systems, pp.  275–293. Springer, 2022.
  19. A Concise Introduction to Decentralized POMDPs. Springer Publishing Company, Incorporated, 1st edition, 2016. ISBN 3319289276.
  20. Agent modelling under partial observability for deep reinforcement learning. In Advances in Neural Information Processing Systems, 2021a.
  21. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. In Advances in Neural Information Processing Systems, volume 34, 2021b.
  22. Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning. In Proceedings of the 38 th International Conference on Machine Learning, volume 139. PMLR, June 2021.
  23. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research. PMLR, 2018.
  24. The StarCraft Multi-Agent Challenge. CoRR, abs/1902.04043, 2019.
  25. High-dimensional continuous control using generalized advantage estimation. CoRR, abs/1506.02438, 2015. URL https://api.semanticscholar.org/CorpusID:3075448.
  26. Proximal policy optimization algorithms. ArXiv, abs/1707.06347, 2017. URL https://api.semanticscholar.org/CorpusID:28695052.
  27. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, pp.  5887–5896. PMLR, May 2019. URL https://proceedings.mlr.press/v97/son19a.html.
  28. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pp.  1504–1509, July 2010. doi: 10.1609/aaai.v24i1.7529. URL https://ojs.aaai.org/index.php/AAAI/article/view/7529.
  29. Collaborating with Humans without Human Data. January 2022. URL http://arxiv.org/abs/2110.08176. arXiv:2110.08176 [cs].
  30. Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and Multi Agent Systems, AAMAS ’18, 2018.
  31. Ming Tan. Multi-agent reinforcement learning: Independent versus cooperative agents. In International Conference on Machine Learning, 1997. URL https://api.semanticscholar.org/CorpusID:267858156.
  32. Options as responses: Grounding behavioural hierarchies in multi-agent reinforcement learning. In International Conference on Machine Learning, pp.  9733–9742. PMLR, 2020.
  33. Ldsa: Learning dynamic subtask assignment in cooperative multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 35:1698–1710, 2022.
  34. The surprising effectiveness of mappo in cooperative multi-agent games. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2022.
  35. Deep interactive bayesian reinforcement learning via meta-learning. arXiv preprint arXiv:2101.03864, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Caroline Wang (8 papers)
  2. Arrasy Rahman (17 papers)
  3. Ishan Durugkar (13 papers)
  4. Elad Liebman (9 papers)
  5. Peter Stone (184 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com