Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 164 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 72 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Toward Finding Strong Pareto Optimal Policies in Multi-Agent Reinforcement Learning (2410.19372v1)

Published 25 Oct 2024 in cs.LG

Abstract: In this work, we study the problem of finding Pareto optimal policies in multi-agent reinforcement learning problems with cooperative reward structures. We show that any algorithm where each agent only optimizes their reward is subject to suboptimal convergence. Therefore, to achieve Pareto optimality, agents have to act altruistically by considering the rewards of others. This observation bridges the multi-objective optimization framework and multi-agent reinforcement learning together. We first propose a framework for applying the Multiple Gradient Descent algorithm (MGDA) for learning in multi-agent settings. We further show that standard MGDA is subjected to weak Pareto convergence, a problem that is often overlooked in other learning settings but is prevalent in multi-agent reinforcement learning. To mitigate this issue, we propose MGDA++, an improvement of the existing algorithm to handle the weakly optimal convergence of MGDA properly. Theoretically, we prove that MGDA++ converges to strong Pareto optimal solutions in convex, smooth bi-objective problems. We further demonstrate the superiority of our MGDA++ in cooperative settings in the Gridworld benchmark. The results highlight that our proposed method can converge efficiently and outperform the other methods in terms of the optimality of the convergent policies. The source code is available at \url{https://github.com/giangbang/Strong-Pareto-MARL}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Amir Beck. First-order methods in optimization. SIAM, 2017.
  2. Susumu Cato. When is weak pareto equivalent to strong pareto? Economics Letters, 222:110953, 2023.
  3. Pareto actor-critic for equilibrium selection in multi-agent reinforcement learning. arXiv preprint arXiv:2209.14344, 2022.
  4. Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
  5. Jean-Antoine Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
  6. Yu G Evtushenko and MA Posypkin. A deterministic algorithm for global multi-objective optimization. Optimization Methods and Software, 29(5):1005–1019, 2014.
  7. Steepest descent methods for multicriteria optimization. Mathematical methods of operations research, 51:479–494, 2000.
  8. Complexity of gradient descent for multiobjective optimization. Optimization Methods and Software, 34(5):949–959, 2019.
  9. Learning altruistic behaviours in reinforcement learning without external rewards, 2022.
  10. Revisiting scalarization in multi-task learning: A theoretical perspective. Advances in Neural Information Processing Systems, 36, 2024.
  11. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  12. Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251, 2021.
  13. Global convergence of multi-agent policy gradient in markov potential games. arXiv preprint arXiv:2106.01969, 2021.
  14. Pareto multi-task learning. Advances in neural information processing systems, 32, 2019.
  15. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
  16. A multi-objective/multi-task learning framework induced by pareto stationarity. In International Conference on Machine Learning, pages 15895–15907. PMLR, 2022.
  17. Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks. arXiv preprint arXiv:2006.07869, 2020.
  18. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research, 21(178):1–51, 2020.
  19. Optimization on pareto sets: On a theory of multi-objective optimization. arXiv preprint arXiv:2308.02145, 2023.
  20. Multi-task learning as multi-objective optimization. Advances in neural information processing systems, 31, 2018.
  21. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
  22. Trust region bounds for decentralized ppo under non-stationarity. arXiv preprint arXiv:2202.00082, 2022.
  23. Reinforcement learning: An introduction. MIT press, 2018.
  24. Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.
  25. Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993.
  26. Direction-oriented multi-objective learning: Simple and provable stochastic algorithms. Advances in Neural Information Processing Systems, 36, 2024.
  27. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
  28. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020.
  29. Convergence rate of gradient descent method for multi-objective optimization. Journal of Computational Mathematics, 37(5):689, 2019.
  30. Local optimization achieves global optimality in multi-agent reinforcement learning. arXiv preprint arXiv:2305.04819, 2023.
  31. On the convergence of stochastic multi-objective gradient manipulation and beyond. Advances in Neural Information Processing Systems, 35:38103–38115, 2022.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.