Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field Games (2401.06566v1)

Published 12 Jan 2024 in eess.SY, cs.LG, cs.SY, and math.OC

Abstract: In this paper, we introduce the maximum casual entropy Inverse Reinforcement Learning (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion. The state space of a typical agent is finite. Our approach begins with a comprehensive review of the maximum entropy IRL problem concerning deterministic and stochastic Markov decision processes (MDPs) in both finite and infinite-horizon scenarios. Subsequently, we formulate the maximum casual entropy IRL problem for MFGs - a non-convex optimization problem with respect to policies. Leveraging the linear programming formulation of MDPs, we restructure this IRL problem into a convex optimization problem and establish a gradient descent algorithm to compute the optimal solution with a rate of convergence. Finally, we present a new algorithm by formulating the MFG problem as a generalized Nash equilibrium problem (GNEP), which is capable of computing the mean-field equilibrium (MFE) for the forward RL problem. This method is employed to produce data for a numerical example. We note that this novel algorithm is also applicable to general MFE computations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Apprenticeship learning via inverse reinforcement learning. ICML ’04, 2004.
  2. Equilibria of dynamic games with many players: Existence, approximation, and market structure. Journal of Economic Theory, 156:269–316, 2015.
  3. Individual-level inverse reinforcement learning for mean field games. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’22, pages 253–262, 2022.
  4. Adversarial inverse reinforcement learning for mean field games. AAMAS ’23, pages 1088–1096, 2023.
  5. On the solution of the KKT conditions of generalized nash equilibrium problems. SIAM Journal on Optimization, 21(3):1082–1108, 2011.
  6. Discrete time mean-field stochastic linear-quadratic optimal control problems. Automatica, 49:3222–3233, 2013.
  7. F. Facchinei and Christian Kanzow. Generalized Nash equilibrium problems. Annals of Operations Research, 175:177–211, 02 2010.
  8. Finite-dimensional variational inequalities and complementarity problems. Springer, 2003.
  9. Learning robust rewards with adverserial inverse reinforcement learning. In International Conference on Learning Representations, 2018.
  10. Handbook of convergence theorems for (stochastic) gradient methods, 2023.
  11. A primer on maximum causal entropy inverse reinforcement learning, 2022.
  12. Discrete time, finite state space mean field games. J. Math. Pures Appl., 93:308–328, 2010.
  13. O. Hernandez-Lerma and J. Gonzalez-Hernandez. Constrained Markov control processes in Borel spaces:the discounted case. Math. Meth. Oper. Res., 52:271–285, 2000.
  14. O. Hernández-Lerma and J.B. Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, 1996.
  15. Large population stochastic dynamic games: Closed loop McKean-Vlasov systems and the Nash certainty equivalence principle. Communications in Information Systems, 6:221–252, 2006.
  16. Binary mean field stochastic games: Stationary equilibria and comparative statics. Modeling, Stochastic Control, Optimization, and Applications, 2019.
  17. Mean field equilibrium: Uniqueness, existence, and comparative statics. Operations Research, 70(1):585–605, 2022.
  18. Renato D. C. Monteiro and Jong-Shi Pang. A potential reduction Newton method for constrained equations. SIAM Journal on Optimization, 9(3):729–754, 1999.
  19. J. Moon and T. Başar. Discrete-time decentralized control using the risk-sensitive performance criterion in the large population regime: a mean field approach. In ACC 2015, Chicago, Jul. 2015.
  20. J. Moon and T. Başar. Discrete-time mean field Stackelberg games with a large number of followers. In CDC 2016, Las Vegas, Dec. 2016.
  21. A unified view of entropy-regularized Markov decision processes. arXiv:1705.07798, 2017.
  22. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, pages 663–670, 2000. ISBN 1558607072.
  23. M. Nourian and G.N. Nair. Linear-quadratic-Gaussian mean field games under high rate quantization. In CDC 2013, Florence, Dec. 2013.
  24. Maximum margin planning. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pages 729–736, 2006.
  25. Naci Saldi. Linear mean-field games with discounted cost. arXiv:2301.06074, 2023.
  26. Maurice Sion. On general minimax theorems. Pacific Journal of Mathematics, 8:171–176, 1958.
  27. Revisiting maximum entropy inverse reinforcement learning: New perspectives and algorithms. pages 241–249, 2020.
  28. Reinforcement learning in stationary mean-field games. pages 251–259. International Foundation for Autonomous Agents and Multiagent Systems, 2019.
  29. Oblivious equilibrium: A mean field approximation for large-scale dynamic games. In Advances in Neural Information Processing System, Jan. 2005.
  30. Computational methods for oblivious equilibrium. Operations Research, 58(4-part-2):1247–1265, 2010.
  31. Learning deep mean field games for modelling large population behaviour. arXiv:1711.03156, 2018.
  32. Infinite time horizon maximum causal entropy inverse reinforcement learning. IEEE Transactions on Automatic Control, 63(9):2787–2802, 2018.
  33. Maximum entropy inverse reinforcement learning. In Proc. AAAI, pages 1433–1438, 2008.
  34. Modeling interaction via the principle of maximum causal entropy. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pages 1255–1262, 2010.
  35. The principle of maximum causal entropy for estimating interacting processes. IEEE Transactions on Information Theory, 59(4):1966–1980, 2013.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com