Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization (2310.07218v1)

Published 11 Oct 2023 in cs.MA and cs.AI

Abstract: Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which an agent is influenced by unseen co-players depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on effectively training agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Melting Pot 2.0. arXiv preprint arXiv:2211.13746 (2022).
  2. Noa Agmon and Peter Stone. 2012. Leading ad hoc agents in joint action settings with multiple teammates.. In AAMAS. 341–348.
  3. The hanabi challenge: A new frontier for ai research. Artificial Intelligence 280 (2020), 103216.
  4. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
  5. Craig Boutilier. 1996. Planning, learning and coordination in multiagent decision processes. In TARK, Vol. 96. Citeseer, 195–210.
  6. Quasi-equivalence discovery for zero-shot emergent communication. arXiv preprint arXiv:2103.08067 (2021).
  7. Exploring zero-shot emergent communication in embodied multi-agent populations. arXiv preprint arXiv:2010.15896 (2020).
  8. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems 32 (2019).
  9. On the utility of model learning in hri. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 317–325.
  10. Open problems in cooperative AI. arXiv preprint arXiv:2012.08630 (2020).
  11. Anthony WF Edwards. 2005. RA Fischer, statistical methods for research workers, (1925). In Landmark writings in western mathematics 1640-1940. Elsevier, 856–870.
  12. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  13. On the importance of environments in human-robot coordination. arXiv preprint arXiv:2106.10853 (2021).
  14. Super-human performance in gran turismo sport using deep reinforcement learning. IEEE Robotics and Automation Letters 6, 3 (2021), 4257–4264.
  15. Fictitious self-play in extensive-form games. In International conference on machine learning. PMLR, 805–813.
  16. Feature-based Joint Planning and Norm Learning in Collaborative Games.. In CogSci.
  17. “other-play” for zero-shot coordination. In International Conference on Machine Learning. PMLR, 4399–4410.
  18. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859–865.
  19. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).
  20. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning. PMLR, 3040–3049.
  21. Coordinate to cooperate or compete: abstract goals and joint intentions in social interaction. In CogSci.
  22. Scalable evaluation of multi-agent reinforcement learning with melting pot. In International conference on machine learning. PMLR, 6187–6199.
  23. Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994. Elsevier, 157–163.
  24. On the interaction between supervision and self-play in emergent communication. arXiv preprint arXiv:2002.01093 (2020).
  25. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30 (2017).
  26. Control-aware prediction objectives for autonomous driving. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, 01–08.
  27. Social diversity and social preferences in mixed-motive reinforcement learning. arXiv preprint arXiv:2002.02325 (2020).
  28. Quantifying the effects of environment and population diversity in multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 21.
  29. Judea Pearl. 2013. Structural counterfactuals: A brief introduction. Cognitive science 37, 6 (2013), 977–985.
  30. Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents. arXiv preprint arXiv:2308.09595 (2023).
  31. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
  32. On the critical role of conventions in adaptive human-AI collaboration. arXiv preprint arXiv:2104.02871 (2021).
  33. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (2018), 1140–1144.
  34. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354–359.
  35. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 24. 1504–1509.
  36. Leading a best-response teammate in an ad hoc team. In International Workshop on Agent-Mediated Electronic Commerce. Springer, 132–146.
  37. Collaborating with humans without human data. Advances in Neural Information Processing Systems 34 (2021), 14502–14515.
  38. Student. 1908. The probable error of a mean. Biometrika 6, 1 (1908), 1–25.
  39. Options as responses: Grounding behavioural hierarchies in multi-agent RL. arXiv preprint arXiv:1906.01470 (2019).
  40. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
  41. Diversifying AI: Towards Creative Chess with AlphaZero. arXiv preprint arXiv:2308.09175 (2023).
  42. DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning. arXiv:2106.06135 [cs.AI]
Citations (1)

Summary

We haven't generated a summary for this paper yet.