Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning (2312.11768v1)

Published 19 Dec 2023 in cs.AI, cs.LG, and cs.MA

Abstract: While there has been significant progress in curriculum learning and continuous learning for training agents to generalize across a wide variety of environments in the context of single-agent reinforcement learning, it is unclear if these algorithms would still be valid in a multi-agent setting. In a competitive setting, a learning agent can be trained by making it compete with a curriculum of increasingly skilled opponents. However, a general intelligent agent should also be able to learn to act around other agents and cooperate with them to achieve common goals. When cooperating with other agents, the learning agent must (a) learn how to perform the task (or subtask), and (b) increase the overall team reward. In this paper, we aim to answer the question of what kind of cooperative teammate, and a curriculum of teammates should a learning agent be trained with to achieve these two objectives. Our results on the game Overcooked show that a pre-trained teammate who is less skilled is the best teammate for overall team reward but the worst for the learning of the agent. Moreover, somewhat surprisingly, a curriculum of teammates with decreasing skill levels performs better than other types of curricula.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. Thinking fast and slow with deep learning and tree search. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 5366–5376, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
  2. C. Boutilier. Planning, learning and coordination in multiagent decision processes. In TARK, volume 96, pages 195–210. Citeseer, 1996.
  3. On the Utility of Learning about Humans for Human-AI Coordination. Curran Associates Inc., Red Hook, NY, USA, 2019.
  4. Emergent complexity and zero-shot transfer via unsupervised environment design. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
  5. Multi-Agent Curricula and Emergent Implicit Signaling. arXiv, June 2021. doi: 10.48550/arXiv.2106.11156.
  6. Grounding aleatoric uncertainty in unsupervised environment design. In NeurIPS 2022, 2022.
  7. Human-level control through deep reinforcement learning. Nature, 518:529–533, 2015. URL https://api.semanticscholar.org/CorpusID:205242740.
  8. Curriculum learning for reinforcement learning domains: A framework and survey. Journal of Machine Learning Research, 21(181):1–50, 2020. URL http://jmlr.org/papers/v21/20-212.html.
  9. Sim-to-real transfer of robotic control with dynamics randomization. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–8, 2017. URL https://api.semanticscholar.org/CorpusID:3707478.
  10. MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning. arXiv, Mar. 2023. doi: 10.48550/arXiv.2303.03376.
  11. M. Tan. Multi-agent reinforcement learning: Independent versus cooperative agents. In International Conference on Machine Learning, 1997. URL https://api.semanticscholar.org/CorpusID:260435822.
  12. Open-ended learning leads to generally capable agents. ArXiv, abs/2107.12808, 2021. URL https://api.semanticscholar.org/CorpusID:236447390.
  13. Y. Yang and J. Wang. An overview of multi-agent reinforcement learning from game theoretical perspective. CoRR, abs/2011.00583, 2020. URL https://arxiv.org/abs/2011.00583.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Rupali Bhati (5 papers)
  2. Sai Krishna Gottipati (8 papers)
  3. Clodéric Mars (4 papers)
  4. Matthew E. Taylor (69 papers)

Summary

We haven't generated a summary for this paper yet.