Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning (2306.17052v2)

Published 29 Jun 2023 in cs.LG, cs.AI, cs.MA, and stat.ML

Abstract: Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent interacting with the infinite population of identical agents instead of considering individual pairwise interactions. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. Beyond the synthetic swarm motion benchmark, we showcase Safe-M$3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on vehicle trajectory data from a service provider in Shenzhen. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Yasin Abbasi-Yadkori and Csaba Szepesvaŕi. 2011. Regret bounds for the adaptive control of Linear Quadratic systems. Journal of Machine Learning Research 19 (2011), 1–26.
  2. Two numerical approaches to stationary mean-field games. Dynamic Games and Applications 7 (2017), 657–682.
  3. Yoav Alon and Huiyu Zhou. 2020. Multi-agent reinforcement learning for unmanned aerial vehicle coordination by multi-critic policy gradient optimization. arXiv preprint arXiv:2012.15472 (2020).
  4. Safe reinforcement learning via shielding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
  5. Reinforcement Learning for Mean Field Games, with Applications to Economics. arXiv preprint arXiv:2106.13755 (2021).
  6. Unified reinforcement Q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems (2022), 1–55.
  7. Nicole Bäuerle. 2021. Mean Field Markov Decision Processes. arXiv preprint arXiv:2106.08755 (2021).
  8. Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems 30 (2017).
  9. Michiel CJ Bliemer and Mark PH Raadsen. 2020. Static traffic assignment with residual queues and spillback. Transportation Research Part B: Methodological 132 (2020), 303–319.
  10. Empty-car routing in ridesharing systems. Operations Research 67, 5 (2019), 1437–1452.
  11. Linear-quadratic zero-sum mean-field type games: Optimality conditions and policy optimization. Journal of Dynamics and Games. 2021, Volume 8, Pages 403-443 8, 4 (2021), 403.
  12. Linear-quadratic mean-field reinforcement learning: convergence of policy gradient methods. arXiv preprint arXiv:1910.04295 (2019).
  13. Model-free mean-field reinforcement learning: mean-field MDP and mean-field Q-learning. arXiv preprint arXiv:1910.12802 (2019).
  14. Pessimism meets invariance: Provably efficient offline mean-field multi-agent RL. Advances in Neural Information Processing Systems 34 (2021), 17913–17926.
  15. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3387–3395.
  16. Sayak Ray Chowdhury and Aditya Gopalan. 2019. Online learning in kernelized markov decision processes. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 3197–3205.
  17. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems 31 (2018).
  18. Philippe Clement and Wolfgang Desch. 2008. An elementary proof of the triangle inequality for the Wasserstein metric. Proc. Amer. Math. Soc. 136, 1 (2008), 333–339.
  19. Efficient model-based reinforcement learning through optimistic policy search and planning. Advances in Neural Information Processing Systems 33 (2020), 14156–14170.
  20. Carlos F Daganzo. 1994. The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory. Transportation research part B: methodological 28, 4 (1994), 269–287.
  21. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757 (2018).
  22. Stochastic linear optimization under bandit feedback. (2008).
  23. Richard M Dudley. 2018. Real analysis and probability. CRC Press.
  24. On the convergence of model free learning in mean field games. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7143–7150.
  25. Safe Multi-Agent Reinforcement Learning via Shielding. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. 483–491.
  26. Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16, 1 (2015), 1437–1480.
  27. Mean field for Markov decision processes: from discrete to continuous optimization. IEEE Trans. Automat. Control (2012), 2266–2280.
  28. Clement Gehring and Doina Precup. 2013. Smart exploration in reinforcement learning using absolute temporal difference errors. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems. 1037–1044.
  29. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249–256.
  30. Dynamic Programming Principles for Mean-Field Controls with Learning. arXiv preprint arXiv:1911.07314 (2020).
  31. Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis. SIAM Journal on Mathematics of Data Science 3, 4 (2021), 1168–1196.
  32. Multi-agent constrained policy optimisation. arXiv preprint arXiv:2110.02793 (2021).
  33. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330 (2022).
  34. Array programming with NumPy. Nature 585, 7825 (2020), 357–362.
  35. Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized ε𝜀\varepsilonitalic_ε-equilibria. IEEE Trans. Automat. Control 52, 9 (2007), 1560–1571.
  36. Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Communications in Information & Systems 6, 3 (2006), 221–252.
  37. Data-driven model predictive control of autonomous mobility-on-demand systems. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 6019–6025.
  38. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.
  39. Near-optimal Regret Bounds for Reinforcement Learning. Journal of Machine Learning Research (2010).
  40. Matej Jusup and Tadeusz Janik. 2023. Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning. https://doi.org/10.5281/zenodo.10431636
  41. Daniel Lacker. 2017. Limit theory for controlled McKean–Vlasov dynamics. SIAM Journal on Control and Optimization 55, 3 (2017), 1641–1672.
  42. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems 30 (2017).
  43. Jean-Michel Lasry and Pierre-Louis Lions. 2006a. Jeux à champ moyen. i–le cas stationnaire. Comptes Rendus Mathématique 343, 9 (2006), 619–625.
  44. Jean-Michel Lasry and Pierre-Louis Lions. 2006b. Jeux à champ moyen. ii–horizon fini et contrôle optimal. Comptes Rendus Mathématique 343, 10 (2006), 679–684.
  45. Learning Mean Field Games: A Survey. arXiv preprint arXiv:2205.12944v2 (2022).
  46. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
  47. Efficient large-scale fleet management via multi-agent deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1774–1783.
  48. Cmix: Deep multi-agent reinforcement learning with peak and average constraints. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part I 21. Springer, 157–173.
  49. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30 (2017).
  50. Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8767–8775.
  51. Calibrated model-based deep reinforcement learning. In International Conference on Machine Learning. PMLR, 4314–4323.
  52. Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach. Transportation Research Part C: Emerging Technologies 115 (2020), 102626.
  53. Teodor Mihai Moldovan and Pieter Abbeel. 2012. Safe exploration in markov decision processes. arXiv preprint arXiv:1205.4810 (2012).
  54. Optimism-driven exploration for nonlinear systems. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3239–3246.
  55. Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL). arXiv preprint arXiv:2209.07437 (2022).
  56. Médéric Motte and Huyên Pham. 2019. Mean-field Markov decision processes with common noise and open-loop controls. arXiv preprint arXiv:1912.07883 (2019).
  57. Automatic differentiation in pytorch. (2017).
  58. Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning. Transactions on Machine Learning Research (2023).
  59. Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2017).
  60. Yury Polyanskiy and Yihong Wu. 2016. Wasserstein continuity of entropy and outer bounds for interference channels. IEEE Transactions on Information Theory 62, 7 (2016), 3992–4002.
  61. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research 21, 1 (2020), 7234–7284.
  62. Smart Grid for Industry Using Multi-Agent Reinforcement Learning. Applied Sciences 10, 19 (2020). https://doi.org/10.3390/app10196900
  63. Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation. , 19580–19597 pages.
  64. Travel behavior: Shared mobility and transportation equity. Technical Report. United States. Federal Highway Administration. Office of Policy ….
  65. Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces. arXiv preprint arXiv:2108.03952 (2021).
  66. An efficient initialization approach of Q-learning for mobile robots. International Journal of Control, Automation and Systems 10, 1 (2012), 166–172.
  67. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. In International Conference on Machine Learning. 1015–1022.
  68. Jayakumar Subramanian and Aditya Mahajan. 2019. Reinforcement Learning in Stationary Mean-Field Games. In International Conference on Autonomous Agents and MultiAgent Systems. 251–259.
  69. Log barriers for safe black-box optimization with application to safe reinforcement learning. arXiv preprint arXiv:2207.10415 (2022).
  70. The link transmission model with variable fundamental diagrams and initial conditions. Transportmetrica B: Transport Dynamics (2018).
  71. Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning. In International Conference on Machine Learning. 10092–10103.
  72. Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time. , 10772–10782 pages.
  73. Rebalancing shared mobility-on-demand systems: A reinforcement learning approach. In 2017 IEEE 20th international conference on intelligent transportation systems (ITSC). Ieee, 220–225.
  74. Margaret H Wright. 1992. Interior methods for constrained optimization. Acta numerica 1 (1992), 341–407.
  75. Rick Zhang and Marco Pavone. 2016. Control of robotic mobility-on-demand systems: a queueing-theoretical perspective. The International Journal of Robotics Research 35, 1-3 (2016), 186–203.
  76. Multi-agent safe planning with gaussian processes. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 6260–6267.
Citations (2)

Summary

We haven't generated a summary for this paper yet.