Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization (2310.07218v1)
Abstract: Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which an agent is influenced by unseen co-players depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on effectively training agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget.
- Melting Pot 2.0. arXiv preprint arXiv:2211.13746 (2022).
- Noa Agmon and Peter Stone. 2012. Leading ad hoc agents in joint action settings with multiple teammates.. In AAMAS. 341–348.
- The hanabi challenge: A new frontier for ai research. Artificial Intelligence 280 (2020), 103216.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
- Craig Boutilier. 1996. Planning, learning and coordination in multiagent decision processes. In TARK, Vol. 96. Citeseer, 195–210.
- Quasi-equivalence discovery for zero-shot emergent communication. arXiv preprint arXiv:2103.08067 (2021).
- Exploring zero-shot emergent communication in embodied multi-agent populations. arXiv preprint arXiv:2010.15896 (2020).
- On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems 32 (2019).
- On the utility of model learning in hri. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 317–325.
- Open problems in cooperative AI. arXiv preprint arXiv:2012.08630 (2020).
- Anthony WF Edwards. 2005. RA Fischer, statistical methods for research workers, (1925). In Landmark writings in western mathematics 1640-1940. Elsevier, 856–870.
- Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
- On the importance of environments in human-robot coordination. arXiv preprint arXiv:2106.10853 (2021).
- Super-human performance in gran turismo sport using deep reinforcement learning. IEEE Robotics and Automation Letters 6, 3 (2021), 4257–4264.
- Fictitious self-play in extensive-form games. In International conference on machine learning. PMLR, 805–813.
- Feature-based Joint Planning and Norm Learning in Collaborative Games.. In CogSci.
- “other-play” for zero-shot coordination. In International Conference on Machine Learning. PMLR, 4399–4410.
- Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859–865.
- Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).
- Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning. PMLR, 3040–3049.
- Coordinate to cooperate or compete: abstract goals and joint intentions in social interaction. In CogSci.
- Scalable evaluation of multi-agent reinforcement learning with melting pot. In International conference on machine learning. PMLR, 6187–6199.
- Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994. Elsevier, 157–163.
- On the interaction between supervision and self-play in emergent communication. arXiv preprint arXiv:2002.01093 (2020).
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30 (2017).
- Control-aware prediction objectives for autonomous driving. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, 01–08.
- Social diversity and social preferences in mixed-motive reinforcement learning. arXiv preprint arXiv:2002.02325 (2020).
- Quantifying the effects of environment and population diversity in multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 21.
- Judea Pearl. 2013. Structural counterfactuals: A brief introduction. Cognitive science 37, 6 (2013), 977–985.
- Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents. arXiv preprint arXiv:2308.09595 (2023).
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- On the critical role of conventions in adaptive human-AI collaboration. arXiv preprint arXiv:2104.02871 (2021).
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (2018), 1140–1144.
- Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354–359.
- Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 24. 1504–1509.
- Leading a best-response teammate in an ad hoc team. In International Workshop on Agent-Mediated Electronic Commerce. Springer, 132–146.
- Collaborating with humans without human data. Advances in Neural Information Processing Systems 34 (2021), 14502–14515.
- Student. 1908. The probable error of a mean. Biometrika 6, 1 (1908), 1–25.
- Options as responses: Grounding behavioural hierarchies in multi-agent RL. arXiv preprint arXiv:1906.01470 (2019).
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
- Diversifying AI: Towards Creative Chess with AlphaZero. arXiv preprint arXiv:2308.09175 (2023).
- DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning. arXiv:2106.06135 [cs.AI]