Open Ad Hoc Teamwork with Cooperative Game Theory (2402.15259v5)
Abstract: Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents with various agent-types, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the representation of the joint Q-value for OAHT and its learning paradigm, through the lens of cooperative game theory. Building on our theory, we propose a novel algorithm named CIAO, based on GPL's framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.
- Leading ad hoc agents in joint action settings with multiple teammates. In AAMAS, pp. 341–348, 2012.
- Modeling uncertainty in leading ad hoc teams. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pp. 397–404, 2014.
- A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, pp. 1155–1156, 2013.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
- Making friends on the fly: Cooperating with new teammates. Artificial Intelligence, 242:132–171, 2017.
- On partially controlled multi-agent systems. Journal of Artificial Intelligence Research, 4:477–507, 1996.
- Coalitional affinity games. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 1319–1320, 2009.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Computational aspects of cooperative game theory. Springer Nature, 2022.
- AATEAM: achieving the ad hoc teamwork by employing the attention mechanism. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 7095–7102. AAAI Press, 2020.
- Liir: Learning individual intrinsic reward in multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
- A survey of embodied ai: From simulators to research tasks. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022.
- Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 2005.
- Learning to communicate with deep multi-agent reinforcement learning. Advances in neural information processing systems, 29, 2016.
- Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- Online ad hoc teamwork under partial observability. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Harsanyi, J. C. Games with incomplete information played by “bayesian” players, i–iii part i. the basic model. Management science, 14(3):159–182, 1967.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Learning attentional communication for multi-agent cooperation. Advances in neural information processing systems, 31, 2018.
- Half field offense in robocup soccer: A multiagent reinforcement learning case study. In RoboCup 2006: Robot Soccer World Cup X 10, pp. 72–85. Springer, 2007.
- Learning to schedule communication in multi-agent reinforcement learning. arXiv preprint arXiv:1902.01554, 2019.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- LIGS: learnable intrinsic-reward generation selection for multi-agent learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- A survey of ad hoc teamwork research. In Multi-Agent Systems: 19th European Conference, EUMAS 2022, Düsseldorf, Germany, September 14–16, 2022, Proceedings, pp. 275–293. Springer, 2022.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Towards open ad hoc teamwork using graph-based policy learning. In International Conference on Machine Learning, pp. 8776–8786. PMLR, 2021.
- QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp. 4292–4301. PMLR, 2018.
- Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 33, 2020.
- Shapley, L. S. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
- Learning hedonic games. In IJCAI, pp. 2730–2736, 2017.
- The development of embodied cognition: Six lessons from babies. Artificial life, 11(1-2):13–29, 2005.
- To teach or not to teach? decision making under uncertainty in ad hoc teams. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, pp. 117–124, 2010.
- Leading a best-response teammate in an ad hoc team. In International Workshop on Agent-Mediated Electronic Commerce, pp. 132–146. Springer, 2009.
- Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Fox, M. and Poole, D. (eds.), Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010. AAAI Press, 2010.
- Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29, 2016.
- Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018, pp. 2085–2087. International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA / ACM, 2018.
- Reinforcement learning: An introduction. MIT press, 2018.
- Relational forward models for multi-agent learning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Shapley q-value: A local reward approach to solve global reward games. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):7285–7292, Apr 2020.
- Multi-agent reinforcement learning for active voltage control on power distribution networks. Advances in Neural Information Processing Systems, 34:3271–3284, 2021.
- Shaq: Incorporating shapley value theory into multi-agent q-learning. Advances in Neural Information Processing Systems, 35:5941–5954, 2022.
- Q-learning. Machine learning, 8:279–292, 1992.
- Online planning for ad hoc autonomous agent teams. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), pp. 439–445, 2011.
- Learning latent representations to influence multi-agent interaction. In Conference on robot learning, pp. 575–588. PMLR, 2021.
- Multi-agent dynamic algorithm configuration. Advances in Neural Information Processing Systems, 35:20147–20161, 2022.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
- Deep interactive bayesian reinforcement learning via meta-learning. In Dignum, F., Lomuscio, A., Endriss, U., and Nowé, A. (eds.), AAMAS ’21: 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual Event, United Kingdom, May 3-7, 2021, pp. 1712–1714. ACM, 2021. doi: 10.5555/3463952.3464210.