Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Review of Cooperation in Multi-agent Learning (2312.05162v1)

Published 8 Dec 2023 in cs.MA, cs.AI, cs.GT, and cs.LG

Abstract: Cooperation in multi-agent learning (MAL) is a topic at the intersection of numerous disciplines, including game theory, economics, social sciences, and evolutionary biology. Research in this area aims to understand both how agents can coordinate effectively when goals are aligned and how they may cooperate in settings where gains from working together are possible but possibilities for conflict abound. In this paper we provide an overview of the fundamental concepts, problem settings and algorithms of multi-agent learning. This encompasses reinforcement learning, multi-agent sequential decision-making, challenges associated with multi-agent cooperation, and a comprehensive review of recent progress, along with an evaluation of relevant metrics. Finally we discuss open challenges in the field with the aim of inspiring new avenues for research.

A Review of Cooperation in Multi-Agent Learning

The academic paper "A review of cooperation in multi-agent learning" by authors Yali Du, Joel Z. Leibo, Usman Islam, Richard Willis, and Peter Sunehag offers an exhaustive survey of the landscape of multi-agent learning (MAL), focusing prominently on cooperative strategies. Throughout this essay, key topics from the paper will be assessed and evaluated, touching upon multi-agent reinforcement learning (MARL), problem-settings and the challenges inherent in the coordination of multiple agents with respect to their aligned or conflicting objectives.

Overview of Multi-Agent Learning

Multi-agent learning is situated at the intersection of multiple academic fields, extending essential concepts from game theory and reinforcement learning to apply specifically in multi-agent contexts. The ultimate aim is to equip multiple agents with the capacity to learn, adapt, and cooperate in dynamically shared environments. It is in such environments that the confluence of agent actions leads to both cooperative opportunities and conflicts, requiring algorithms that effectively manage such complexities.

Challenges in Multi-Agent Systems

The paper identifies two major branches in cooperative multi-agent learning: team-based multi-agent learning and mixed-motive multi-agent learning. The former involves a unified objective across agents, typically targeted at maximizing a shared utility function, while the latter framework involves settings where agents have differing incentives—often encapsulated in social dilemma situations where individual rationality is at odds with collective well-being.

Efficient learning in these settings is hindered by several challenges:

  • Non-stationarity: Agents' policies change the environment from the perspective of other agents, introducing instability.
  • Exploration and Scalability: Finding effective strategies in expansive joint action spaces and scaling methods to accommodate varied agent numbers is non-trivial.
  • Credit Assignment: Allocating credit among agents for their contributions to a collective task is intrinsically difficult in shared reward scenarios.
  • Generalisation to Novel Partners: The ability to coordinate with previously unencountered agents is crucial for effective deployment of MAL methods in real-world applications.

Approaches in Team-Based and Mixed-Motive Contexts

Team-based cooperative learning primarily addresses contexts like team games characterized by shared objectives. Techniques such as centralized-training-decentralized-execution frameworks and individual-global-maximization architectures (e.g., QMIX, VDN, QTRAN) dominate the approach. These methods enhance collaborative learning through efficient credit distribution and scalable coordination among decentralized agents.

For mixed-motive contexts, where agents might be self-interested, methods often employ mechanisms like social influence, reputation systems, and contracts. These mechanisms aim to mitigate the conflicts between short-term individual gains and long-term collective benefits in social dilemmas.

Evaluating Methods and Metrics

The evaluation of multi-agent learning methods is multifaceted, often involving specialized environments like StarCraft Multi-agent Challenge (SMAC) and Overcooked to test both scalability and coordination effectiveness. Metrics span from reward-centric evaluations like collective return to broader social measures like sustainability and equality, each providing unique insights into the degree of cooperative behavior exhibited by agents.

Implications and Future Directions

This paper provides a comprehensive review of the cooperative aspects of MAL, but it also suggests future avenues including enhancing generalization in agent behavior, employing foundational model-based approaches, and developing sophisticated benchmark tasks for deeper insights into cooperative dynamics.

Emerging themes in the paper such as interaction with LLM-based autonomous agents and zero-shot coordination with humans point towards significant expansions of present computational frameworks. The challenges and opportunities outlined in the paper invite further investigation into adaptive learning mechanisms, ultimately aspiring for seamless cooperation across heterogeneous multi-agent systems.

The intricate landscape described, complete with methodological and evaluative insights, serves as a foundational reference for further studies in cooperative multi-agent learning. The work underscores the potential for cross-disciplinary approaches that draw upon the essence of human-like cooperative decision-making mechanisms in complex dynamic environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (157)
  1. The ai economist: Optimal economic policy design via two-level deep reinforcement learning. arXiv preprint arXiv:2108.02755, 2021a.
  2. Emergent bartering behaviour in multi-agent reinforcement learning. arXiv preprint arXiv:2205.06760, 2022.
  3. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019.
  4. A social path to human-like artificial intelligence. Nature Machine Intelligence, pages 1–8, 2023.
  5. Beyond the matrix: Experimental approaches to studying social-ecological systems. 2023.
  6. Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
  7. Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994.
  8. Open Problems in Cooperative AI, December 2020.
  9. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI, 1998(746-752):2, 1998.
  10. Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018a.
  11. Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 2085–2087, 2018.
  12. Anatol Rapoport. Prisoner’s dilemma—recollections and observations. In Game Theory as a Theory of a Conflict Resolution, pages 17–34. Springer, 1974.
  13. Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 464–473, 2017.
  14. Anthony Rocco Cassandra. Exact and approximate algorithms for partially observable Markov decision processes. Brown University, 1998.
  15. Q-learning. Machine learning, 8:279–292, 1992.
  16. A unified analysis of value-function-based reinforcement-learning algorithms. Neural computation, 11(8):2017–2060, 1999.
  17. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
  18. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  19. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine learning, 38:287–308, 2000.
  20. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, pages 1057–1063, 1999.
  21. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  22. Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
  23. Deterministic policy gradient algorithms. In International conference on machine learning, pages 387–395. Pmlr, 2014.
  24. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  25. Trust region policy optimization. In International Conference on Machine Learning (ICML), pages 1889–1897, 2015.
  26. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML), pages 1861–1870, 2018.
  27. Thomas C Schelling. The Strategy of Conflict: with a new Preface by the Author. Harvard university press, 1960.
  28. Craig Boutilier. Planning, learning and coordination in multiagent decision processes. In Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pages 195–210, 1996.
  29. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the seventeenth international conference on machine learning, pages 535–542, 2000.
  30. Reinforcement learning to play an optimal nash equilibrium in team markov games. Advances in neural information processing systems, 15, 2002.
  31. Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning (ICML), pages 5872–5881, 2018.
  32. Scalable multi-agent reinforcement learning for networked systems with average reward. Advances in Neural Information Processing Systems, 33:2074–2086, 2020.
  33. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
  34. OpenAI. Openai five. https://blog.openai.com/openai-five/, 2018.
  35. Nash q-learning for general-sum stochastic games. Journal of machine learning research, 4(Nov):1039–1069, 2003.
  36. Michael L Littman et al. Friend-or-foe q-learning in general-sum games. In ICML, volume 1, pages 322–328, 2001.
  37. Dynamic noncooperative game theory. SIAM, 1998.
  38. If multi-agent learning is the answer, what is the question? Artificial intelligence, 171(7):365–377, 2007.
  39. Stabilising experience replay for deep multi-agent reinforcement learning. arXiv preprint arXiv:1702.08887, 2017.
  40. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017a.
  41. Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research. arXiv preprint arXiv:1903.00742, 2019a.
  42. The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020.
  43. Malthusian Reinforcement Learning. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, pages 1099–1107, Montreal, QC, Canada, May 2019b.
  44. Diversity through exclusion (dte): Niche identification for reinforcement learning through value-decomposition. arXiv preprint arXiv:2302.01180, 2023.
  45. Towards open ad hoc teamwork using graph-based policy learning. In International Conference on Machine Learning, pages 8776–8786. PMLR, 2021.
  46. “other-play”’ for zero-shot coordination. In International Conference on Machine Learning, pages 4399–4410. PMLR, 2020.
  47. Scalable evaluation of multi-agent reinforcement learning with melting pot. In International conference on machine learning, pages 6187–6199. PMLR, 2021.
  48. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems, 32, 2019.
  49. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pages 1504–1509, 2010.
  50. Melting pot 2.0. arXiv preprint arXiv:2211.13746, 2022.
  51. Collaborating with humans without human data. Advances in Neural Information Processing Systems, 34:14502–14515, 2021.
  52. Heterogeneous social value orientation leads to meaningful diversity in sequential social dilemmas. arXiv preprint arXiv:2305.00768, 2023.
  53. Open-ended learning in symmetric zero-sum games. In International Conference on Machine Learning, pages 434–443. PMLR, 2019.
  54. Pecan: Leveraging policy ensemble for context-aware zero-shot human-ai coordination. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pages 679–688, 2023.
  55. Cooperative open-ended learning framework for zero-shot coordination. arXiv preprint arXiv:2302.04831, 2023.
  56. Options as responses: Grounding behavioural hierarchies in multi-agent reinforcement learning. In International Conference on Machine Learning, pages 9733–9742. PMLR, 2020.
  57. Reinforcement learning: An introduction. MIT press, 2018.
  58. Maser: Multi-agent reinforcement learning with subgoals generated from experience replay buffer. In International Conference on Machine Learning, pages 10041–10052. PMLR, 2022.
  59. Shapley q-value: A local reward approach to solve global reward games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7285–7292, 2020a.
  60. Stas: Spatial-temporal return decomposition for multi-agent reinforcement learning. arXiv preprint arXiv:2304.07520, 2023.
  61. Agent-time attention for sparse rewards multi-agent reinforcement learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pages 1723–1725, 2022.
  62. Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences. arXiv preprint arXiv:2010.09054, 2020.
  63. Inequity aversion improves cooperation in intertemporal social dilemmas. Advances in neural information processing systems, 31, 2018.
  64. A multi-agent reinforcement learning model of common-pool resource appropriation. Advances in neural information processing systems, 30, 2017.
  65. John Rawls. A Theory of Justice. The Belknap Press of Harvard University Press, 1971.
  66. The origins of the Gini index: Extracts from Variabilità e Mutabilità (1912) by Corrado Gini. The Journal of Economic Inequality, 10(3):421–443, September 2012. ISSN 1569-1721, 1573-8701. doi:10.1007/s10888-011-9188-x.
  67. Social diversity and social preferences in mixed-motive reinforcement learning. arXiv preprint arXiv:2002.02325, 2020.
  68. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems (NeurIPS), pages 6379–6390, 2017b.
  69. Liir: Learning individual intrinsic reward in multi-agent reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
  70. Grid-wise control for multi-agent reinforcement learning in video game ai. In International Conference on Machine Learning (ICML), pages 2576–2585, 2019.
  71. Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), pages 2961–2970. PMLR, 2019.
  72. Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations, 2021.
  73. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
  74. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning (ICML), pages 4292–4301, 2018.
  75. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning (ICML). International Conference on Machine Learning Organizing Committee, 2019.
  76. Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020a.
  77. Qplex: Duplex dueling multi-agent q-learning. In International Conference on Learning Representations, 2020b.
  78. Towards understanding cooperative multi-agent q-learning with value factorization. Advances in Neural Information Processing Systems, 34:29142–29155, 2021a.
  79. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), pages 2137–2145, 2016.
  80. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems (NeurIPS), pages 2244–2252, 2016.
  81. Learning attentional communication for multi-agent cooperation. In Advances in Neural Information Processing Systems (NeurIPS), pages 7254–7264, 2018.
  82. Tarmac: Targeted multi-agent communication. In International Conference on Machine Learning (ICML), pages 1538–1546, 2019.
  83. Learning when to communicate at scale in multiagent cooperative and competitive tasks. In International Conference on Learning Representations (ICLR), 2019.
  84. Learning to schedule communication in multi-agent reinforcement learning. In International Conference on Learning Representations (ICLR), 2019.
  85. Deep coordination graphs. In In International Conference on Machine Learning (ICML), pages 01–11, 2020.
  86. Graph convolutional reinforcement learning. In International Conference on Learning Representations (ICLR), 2020. URL https://openreview.net/forum?id=HkxdQkSYDB.
  87. Multi-agent reinforcement learning for networked system control. In International Conference on Learning Representations (ICLR), 2020. URL https://openreview.net/forum?id=Syx7A3NFvH.
  88. Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993.
  89. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069, 2017.
  90. A concise introduction to decentralized POMDPs, volume 1. Springer, 2016.
  91. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  92. Dueling network architectures for deep reinforcement learning. In International conference on machine learning, pages 1995–2003. PMLR, 2016.
  93. Social learning strategies modify the effect of network structure on group performance. Nature Communications, 7(1):1–8, 2016.
  94. The network structure of exploration and exploitation. Administrative Science Quarterly, 52(4):667–694, 2007.
  95. Networked multi-agent reinforcement learning with emergent communication. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 1858–1860, 2020.
  96. Yedid Hoshen. Vain: Attentional multi-agent predictive modeling. In Advances in Neural Information Processing Systems (NeurIPS), pages 2701–2711, 2017.
  97. Multi-agent graph-attention communication and teaming. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 964–973, 2021.
  98. Decentralized multi-agent reinforcement learning with networked agents: Recent advances. arXiv preprint arXiv:1912.03821, 2019.
  99. Shaq: Incorporating shapley value theory into multi-agent q-learning. Advances in Neural Information Processing Systems, 35:5941–5954, 2022.
  100. Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(02n03):265–279, 2001.
  101. Exploring zero-shot emergent communication in embodied multi-agent populations. arXiv preprint arXiv:2010.15896, 2020.
  102. Trajectory diversity for zero-shot coordination. In International Conference on Machine Learning, pages 7204–7213. PMLR, 2021.
  103. Maximum entropy population based training for zero-shot human-ai coordination. arXiv preprint arXiv:2112.11701, 2021.
  104. Learning dynamics in social dilemmas. Proceedings of the National Academy of Sciences, 99(suppl_3):7229–7236, 2002.
  105. Analyzing Complex Strategic Interactions in Multi-Agent Systems. AAAI Technical Report WS-02-06, June 2002.
  106. Michael P Wellman. Methods for Empirical Game-Theoretic Analysis. In Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, pages 1552–1556. AAAI Press, 2006.
  107. Bounds and dynamics for empirical game theoretic analysis. Autonomous Agents and Multi-Agent Systems, 34(1):7, April 2020. ISSN 1387-2532, 1573-7454. doi:10.1007/s10458-019-09432-y.
  108. Improved Algorithms for Learning Equilibria in Simulation-Based Games. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, pages 79–87, Auckland, New Zealand, May 2020. International Foundation for Autonomous Agents and Multiagent Systems. doi:10.5555/3398761.3398776.
  109. Thomas C Schelling. Hockey helmets, concealed weapons, and daylight saving: A study of binary choices with externalities. Journal of Conflict resolution, 17(3):381–428, 1973.
  110. Reinforcement learning agents acquire flocking and symbiotic behaviour in simulated ecosystems. In Artificial life conference proceedings, pages 103–110. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info …, 2019.
  111. The ai economist: Optimal economic policy design via two-level deep reinforcement learning, 2021b.
  112. Resolving social dilemmas with minimal reward transfer. arXiv preprint arXiv:2310.12928, 2023.
  113. Altruism, selfishness, and spite in traffic routing. In Proceedings of the 9th ACM Conference on Electronic Commerce, pages 140–149, 2008.
  114. Selfishness level of strategic games. Journal of Artificial Intelligence Research, 49:207–240, 2014.
  115. Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, pages 538–547, Auckland, New Zealand, 2020. International Foundation for Autonomous Agents and Multiagent Systems. doi:10.5555/3398761.3398827.
  116. Get it in writing: Formal contracts mitigate social dilemmas in multi-agent rl. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pages 448–456, 2023.
  117. Gifting in multi-agent reinforcement learning. In Proceedings of the 19th International Conference on autonomous agents and multiagent systems, pages 789–797, 2020.
  118. Resolving social dilemmas through reward transfer commitments. In Adaptive and Learning Agents Workshop, 2023.
  119. Joseph Heath. Methodological Individualism. In The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, 2020.
  120. Prosocial learning agents solve generalized stag hunts better than selfish ones. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 2043–2044, 2018a.
  121. Evolving intrinsic motivations for altruistic behavior. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pages 683–692, 2019a.
  122. Inducing cooperation through reward reshaping based on peer evaluations in deep multi-agent reinforcement learning. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 520–528, 2020.
  123. Emergent Prosociality in Multi-Agent Games Through Gifting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pages 434–442, Montreal, Canada, 2021b. ijcai.org. doi:10.24963/ijcai.2021/61.
  124. Reward-sharing relational networks in multi-agent reinforcement learning as a framework for emergent behavior. arXiv preprint arXiv:2207.05886, 2022.
  125. D3c: Reducing the price of anarchy in multi-agent learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pages 498–506, 2022.
  126. Auto-Aligning Multiagent Incentives with Global Objectives. In Proceedings of the Adaptive and Learning Agents Workshop, Online, May 2023. doi:10.5555/3398761.3398825.
  127. Coordinate to cooperate or compete: abstract goals and joint intentions in social interaction. In CogSci, 2016.
  128. Towards AI that can solve social dilemmas. In 2018 AAAI Spring Symposia, page 7, Stanford University, Palo Alto, California, USA, March 2018b. AAAI Press.
  129. Learning with opponent-learning awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 122–130, 2018b.
  130. Learning reciprocity in complex sequential social dilemmas. arXiv preprint arXiv:1903.08082, 2019.
  131. Achieving cooperation through deep multiagent reinforcement learning in sequential prisoner’s dilemmas. In Proceedings of the First International Conference on Distributed Artificial Intelligence, pages 11:1–11:7, Beijing China, October 2019b. ACM. ISBN 978-1-4503-7656-3. doi:10.1145/3356464.3357712.
  132. Stable Opponent Shaping in Differentiable Games. In 7th International Conference on Learning Representations, New Orleans, LA, USA, May 2019. OpenReview.net.
  133. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning, pages 3040–3049. PMLR, 2019.
  134. Learning to incentivize other learning agents. Advances in Neural Information Processing Systems, 33:15208–15219, 2020b.
  135. Foolproof Cooperative Learning. In Proceedings of The 12th Asian Conference on Machine Learning, volume 129 of Proceedings of Machine Learning Research, pages 401–416. PMLR, November 2020.
  136. Model-Free Opponent Shaping. In Proceedings of the 39th International Conference on Machine Learning Research, volume 162, pages 14398–14411. PMLR, July 2022.
  137. A multi-agent reinforcement learning model of reputation and cooperation in human groups. arXiv preprint arXiv:2103.04982, 2021.
  138. Cooperation and Reputation Dynamics with Reinforcement Learning. In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, pages 115–123, Virtual Event, United Kingdom, May 2021. ACM. doi:10.5555/3463952.3463972.
  139. A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings. Collective Intelligence, 2(2):26339137231162025, 2023.
  140. Bowen Baker. Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, virtual, December 2020.
  141. Learning to Participate through Trading of Reward Shares. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence, volume 1, pages 355–362, Lisbon, Portugal, February 2023. SCITEPRESS. doi:10.5220/0011781600003393.
  142. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
  143. Martin A. Nowak. Five rules for the evolution of cooperation. Science, 314(5805):1560–1563, 2006. doi:10.1126/science.1133755. URL https://www.science.org/doi/abs/10.1126/science.1133755.
  144. Robert Axelrod. Effective Choice in the Prisoner’s Dilemma. Journal of Conflict Resolution, 24(1):3–25, March 1980. ISSN 0022-0027, 1552-8766. doi:10.1177/002200278002400101.
  145. Towards a standardised performance evaluation protocol for cooperative marl. Advances in Neural Information Processing Systems, 35:5510–5521, 2022.
  146. Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
  147. On the utility of learning about humans for human-ai coordination, 2020.
  148. Emergent coordination through competition. arXiv preprint arXiv:1902.07151, 2019.
  149. Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4501–4510, 2020.
  150. Lab experiments for the study of social-ecological systems. Science, 328(5978):613–617, 2010.
  151. Maintaining cooperation in complex social dilemmas using deep reinforcement learning. arXiv e-prints, pages arXiv–1707, 2017.
  152. Shared experience actor-critic for multi-agent reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  153. Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS), 2021.
  154. Metagpt: Meta programming for multi-agent collaborative framework, 2023.
  155. Proagent: Building proactive cooperative ai with large language models. arXiv preprint arXiv:2308.11339, 2023.
  156. Generative agents: Interactive simulacra of human behavior, 2023.
  157. Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia. arXiv preprint arXiv:2312.03664, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yali Du (63 papers)
  2. Joel Z. Leibo (70 papers)
  3. Usman Islam (2 papers)
  4. Richard Willis (6 papers)
  5. Peter Sunehag (21 papers)
Citations (23)
Youtube Logo Streamline Icon: https://streamlinehq.com