Aligning Individual and Collective Objectives in Multi-Agent Cooperation (2402.12416v3)
Abstract: Among the research topics in multi-agent learning, mixed-motive cooperation is one of the most prominent challenges, primarily due to the mismatch between individual and collective goals. The cutting-edge research is focused on incorporating domain knowledge into rewards and introducing additional mechanisms to incentivize cooperation. However, these approaches often face shortcomings such as the effort on manual design and the absence of theoretical groundings. To close this gap, we model the mixed-motive game as a differentiable game for the ease of illuminating the learning dynamics towards cooperation. More detailed, we introduce a novel optimization method named \textbf{\textit{A}}ltruistic \textbf{\textit{G}}radient \textbf{\textit{A}}djustment (\textbf{\textit{AgA}}) that employs gradient adjustments to progressively align individual and collective objectives. Furthermore, we theoretically prove that AgA effectively attracts gradients to stable fixed points of the collective objective while considering individual interests, and we validate these claims with empirical evidence. We evaluate the effectiveness of our algorithm AgA through benchmark environments for testing mixed-motive collaboration with small-scale agents such as the two-player public good game and the sequential social dilemma games, Cleanup and Harvest, as well as our self-developed large-scale environment in the game StarCraft II.
- Cooperation and reputation dynamics with reinforcement learning. In Dignum, F., Lomuscio, A., Endriss, U., and Nowé, A. (eds.), AAMAS ’21: 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual Event, United Kingdom, May 3-7, 2021, pp. 115–123. ACM, 2021. doi: 10.5555/3463952.3463972. URL https://www.ifaamas.org/Proceedings/aamas2021/pdfs/p115.pdf.
- Selfishness level of strategic games. J. Artif. Int. Res., 49(1):207–240, jan 2014. ISSN 1076-9757.
- The mechanics of n-player differentiable games. In International Conference on Machine Learning, pp. 354–363. PMLR, 2018.
- Learning to optimize differentiable games. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 5036–5051. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/chen23ab.html.
- Training gans with optimism, 2018.
- David, H. A. Gini’s mean difference rediscovered. Biometrika, 55(3):573–575, 1968. ISSN 00063444. URL http://www.jstor.org/stable/2334264.
- Is independent learning all you need in the starcraft multi-agent challenge? CoRR, abs/2011.09533, 2020. URL https://arxiv.org/abs/2011.09533.
- A review of cooperation in multi-agent learning, 2023.
- Learning with opponent-learning awareness, 2018.
- A variational inequality perspective on generative adversarial networks, 2020.
- Inducing cooperation through reward reshaping based on peer evaluations in deep multi-agent reinforcement learning. In Seghrouchni, A. E. F., Sukthankar, G., An, B., and Yorke-Smith, N. (eds.), Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020, pp. 520–528. International Foundation for Autonomous Agents and Multiagent Systems, 2020. doi: 10.5555/3398761.3398825. URL https://dl.acm.org/doi/10.5555/3398761.3398825.
- Inequity aversion improves cooperation in intertemporal social dilemmas. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp. 3330–3340, 2018. URL https://proceedings.neurips.cc/paper/2018/hash/7fea637fd6d02b8f0adf6f7dc36aed93-Abstract.html.
- Learning to resolve alliance dilemmas in many-player zero-sum games. CoRR, abs/2003.00799, 2020. URL https://arxiv.org/abs/2003.00799.
- Isaacs, R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. Dover books on mathematics. Wiley, 1965. ISBN 9780471428602. URL https://books.google.co.uk/books?id=gtlQAAAAMAAJ.
- Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning, pp. 3040–3049. PMLR, 2019.
- Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Multi-agent reinforcement learning in sequential social dilemmas. CoRR, abs/1702.03037, 2017. URL http://arxiv.org/abs/1702.03037.
- Differentiable game mechanics. Journal of Machine Learning Research, 20(84):1–40, 2019. URL http://jmlr.org/papers/v20/19-008.html.
- Stable opponent shaping in differentiable games, 2021.
- Tackling cooperative incompatibility for zero-shot human-ai coordination, 2024.
- Model-free opponent shaping. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp. 14398–14411. PMLR, 2022. URL https://proceedings.mlr.press/v162/lu22d.html.
- Gifting in multi-agent reinforcement learning. In Proceedings of the 19th International Conference on autonomous agents and multiagent systems, pp. 789–797, 2020.
- Learning dynamics in social dilemmas. Proceedings of the National Academy of Sciences of the United States of America, 99 Suppl 3:7229–36, 06 2002. doi: 10.1073/pnas.092080099.
- Social diversity and social preferences in mixed-motive reinforcement learning. In Seghrouchni, A. E. F., Sukthankar, G., An, B., and Yorke-Smith, N. (eds.), Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020, pp. 869–877. International Foundation for Autonomous Agents and Multiagent Systems, 2020a. doi: 10.5555/3398761.3398863. URL https://dl.acm.org/doi/10.5555/3398761.3398863.
- Social diversity and social preferences in mixed-motive reinforcement learning, 2020b.
- A multi-agent reinforcement learning model of reputation and cooperation in human groups, 2023.
- The numerics of gans. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 1825–1835, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/4588e674d3f0faf985047d4c3f13ed0d-Abstract.html.
- Prosocial learning agents solve generalized stag hunts better than selfish ones. In André, E., Koenig, S., Dastani, M., and Sukthankar, G. (eds.), Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018, pp. 2043–2044. International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA / ACM, 2018. URL http://dl.acm.org/citation.cfm?id=3238065.
- Cooperative and competitive behavior in mixed-motive games. Journal of Conflict Resolution, 9(1):68–78, 1965. doi: 10.1177/002200276500900106. URL https://doi.org/10.1177/002200276500900106.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
- Rapoport, A. Prisoner’s dilemma—recollections and observations. In Game Theory as a Theory of a Conflict Resolution, pp. 17–34. Springer, 1974.
- The starcraft multi-agent challenge. In Elkind, E., Veloso, M., Agmon, N., and Taylor, M. E. (eds.), Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019, pp. 2186–2188. International Foundation for Autonomous Agents and Multiagent Systems, 2019. URL http://dl.acm.org/citation.cfm?id=3332052.
- An open source implementation of sequential social dilemma games. https://github.com/eugenevinitsky/sequential_social_dilemma_games/issues/182, 2019. GitHub repository.
- Eugene2023collectiveized multi-agent settings. Collective Intelligence, 2(2):26339137231162025, 2023. doi: 10.1177/26339137231162025. URL https://doi.org/10.1177/26339137231162025.
- Shapley q-value: A local reward approach to solve global reward games. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 7285–7292. AAAI Press, 2020. doi: 10.1609/AAAI.V34I05.6220. URL https://doi.org/10.1609/aaai.v34i05.6220.
- COLA: Consistent learning with opponent-learning awareness. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 23804–23831. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/willi22a.html.
- Learning to incentivize other learning agents. Advances in Neural Information Processing Systems, 33:15208–15219, 2020.
- The surprising effectiveness of PPO in cooperative multi-agent games. In NeurIPS, 2022.
- Heterogeneous-agent reinforcement learning, 2023.