Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
124 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scalable Offline Reinforcement Learning for Mean Field Games (2410.17898v1)

Published 23 Oct 2024 in cs.LG and cs.MA

Abstract: Reinforcement learning algorithms for mean-field games offer a scalable framework for optimizing policies in large populations of interacting agents. Existing methods often depend on online interactions or access to system dynamics, limiting their practicality in real-world scenarios where such interactions are infeasible or difficult to model. In this paper, we present Offline Munchausen Mirror Descent (Off-MMD), a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data. By leveraging iterative mirror descent and importance sampling techniques, Off-MMD estimates the mean-field distribution from static datasets without relying on simulation or environment dynamics. Additionally, we incorporate techniques from offline reinforcement learning to address common issues like Q-value overestimation, ensuring robust policy learning even with limited data coverage. Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation, highlighting its applicability to real-world multi-agent systems where online experimentation is infeasible. We empirically demonstrate the robustness of Off-MMD to low-quality datasets and conduct experiments to investigate its sensitivity to hyperparameter choices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Mean Field Games: Numerical Methods for the Planning Problem. SIAM Journal on Control and Optimization 50, 1 (2012), 77–109. https://doi.org/10.1137/100790069 arXiv:https://doi.org/10.1137/100790069
  2. Yves Achdou and Italo Capuzzo-Dolcetta. 2010. Mean Field Games: Numerical Methods. SIAM J. Numer. Anal. 48, 3 (2010), 1136–1162. https://doi.org/10.1137/090758477 arXiv:https://doi.org/10.1137/090758477
  3. JAX: composable transformations of Python+NumPy programs. http://github.com/jax-ml/jax
  4. George W. Brown. 1951. Iterative Solution of Games by Fictitious Play. In Activity Analysis of Production and Allocation, T. C. Koopmans (Ed.). Wiley, New York.
  5. A policy iteration method for mean field games. ESAIM: COCV 27 (2021), 85. https://doi.org/10.1051/cocv/2021081
  6. Pierre Cardaliaguet and Saeed Hadikhanloo. 2017. Learning in mean field games: The fictitious play. ESAIM: Control, Optimisation and Calculus of Variations 23, 2 (2017), 569–591. https://doi.org/10.1051/cocv/2016004
  7. Stochastic Graphon Games: I. The Static Case. arXiv:1911.10664 [math.OC] https://arxiv.org/abs/1911.10664
  8. Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning. arXiv:1910.12802 [math.OC] https://arxiv.org/abs/1910.12802
  9. Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 17913–17926. https://proceedings.neurips.cc/paper_files/paper/2021/file/9559fc73b13fa721a816958488a5b449-Paper.pdf
  10. Individual-Level Inverse Reinforcement Learning for Mean Field Games. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (Virtual Event, New Zealand) (AAMAS ’22). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 253–262.
  11. Adversarial Inverse Reinforcement Learning for Mean Field Games. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (London, United Kingdom) (AAMAS ’23). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1088–1096.
  12. Kai Cui and Heinz Koeppl. 2021. Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 130), Arindam Banerjee and Kenji Fukumizu (Eds.). PMLR, 1909–1917. https://proceedings.mlr.press/v130/cui21a.html
  13. Learning Sparse Graphon Mean Field Games. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 206), Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent (Eds.). PMLR, 4486–4514. https://proceedings.mlr.press/v206/fabian23a.html
  14. Learning Mean-Field Games. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/030e65da2b1c944090548d36b244b28d-Paper.pdf
  15. Fictitious Self-Play in Extensive-Form Games. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 805–813. https://proceedings.mlr.press/v37/heinrich15.html
  16. On the Statistical Efficiency of Mean-Field Reinforcement Learning with General Function Approximation. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 238), Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li (Eds.). PMLR, 289–297. https://proceedings.mlr.press/v238/huang24a.html
  17. Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Commun. Inf. Syst. 6 (01 2006). https://doi.org/10.4310/CIS.2006.v6.n3.a5
  18. Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (Auckland, New Zealand) (AAMAS ’24). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 973–982.
  19. MOReL: Model-Based Offline Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 21810–21823. https://proceedings.neurips.cc/paper_files/paper/2020/file/f7efa4f864ae9b88d43527f4b14f750f-Paper.pdf
  20. Ilya Kostrikov. 2021. JAXRL: Implementations of Reinforcement Learning algorithms in JAX. https://doi.org/10.5281/zenodo.5535154
  21. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1179–1191. https://proceedings.neurips.cc/paper_files/paper/2020/file/0d2b2061826a5df3221116a5085a6052-Paper.pdf
  22. OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019). arXiv:1908.09453 [cs.LG] http://arxiv.org/abs/1908.09453
  23. J. M. Lasry and Pierre-Louis Lions. 2007. Mean field games. Japanese Journal of Mathematics 2 (2007), 229–260. https://api.semanticscholar.org/CorpusID:1963678
  24. Scalable deep reinforcement learning algorithms for mean field games. In International Conference on Machine Learning. PMLR, 12078–12095.
  25. Learning in Mean Field Games: A Survey. arXiv:2205.12944 [cs.LG] https://arxiv.org/abs/2205.12944
  26. OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 6120–6130. https://proceedings.mlr.press/v139/lee21f.html
  27. Human-level control through deep reinforcement learning. Nature 518 (2015), 529–533. https://api.semanticscholar.org/CorpusID:205242740
  28. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/cf9a242b70f45317ffd281241fa66502-Paper.pdf
  29. Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning. Transactions on Machine Learning Research (2023). https://openreview.net/forum?id=gvcDSDYUZx
  30. Scaling Mean Field Games by Online Mirror Descent. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (Virtual Event, New Zealand) (AAMAS ’22). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1028–1037.
  31. Sarah Perrin. 2022. Scaling up Multi-agent Reinforcement Learning with Mean Field Games and Vice-versa. Theses. Université de Lille. https://theses.hal.science/tel-04284876
  32. Mean Field Games Flock! The Reinforcement Learning Way. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 356–362. https://doi.org/10.24963/ijcai.2021/50 Main Track.
  33. Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 13199–13213. https://proceedings.neurips.cc/paper_files/paper/2020/file/995ca733e3657ff9f5f3c823d73371e1-Paper.pdf
  34. A Dataset Perspective on Offline Reinforcement Learning. In Proceedings of The 1st Conference on Lifelong Learning Agents (Proceedings of Machine Learning Research, Vol. 199), Sarath Chandar, Razvan Pascanu, and Doina Precup (Eds.). PMLR, 470–517. https://proceedings.mlr.press/v199/schweighofer22a.html
  35. Decentralized Mean Field Games. Proceedings of the AAAI Conference on Artificial Intelligence 36, 9 (Jun. 2022), 9439–9447. https://doi.org/10.1609/aaai.v36i9.21176
  36. Batch stationary distribution estimation. In Proceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org, Article 945, 11 pages.
  37. Learning While Playing in Mean-Field Games: Convergence and Optimality. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 11436–11447. https://proceedings.mlr.press/v139/xie21g.html
  38. Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling. In Advances in Neural Information Processing Systems, Vol. 32. https://proceedings.neurips.cc/paper_files/paper/2019/file/4ffb0d2ba92f664c2281970110a2e071-Paper.pdf
  39. Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations. In International Conference on Learning Representations. https://openreview.net/forum?id=HktK4BeCZ
  40. Mean Field Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 5571–5580. https://proceedings.mlr.press/v80/yang18d.html
  41. MOPO: Model-based Offline Policy Optimization. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 14129–14142. https://proceedings.neurips.cc/paper_files/paper/2020/file/a322852ce0df73e204b7e67cbbef0d0a-Paper.pdf
  42. GenDICE: Generalized Offline Estimation of Stationary Values. In International Conference on Learning Representations. https://openreview.net/forum?id=HkxlcnVFwB

Summary

We haven't generated a summary for this paper yet.