Analysis of Multiscale Reinforcement Q-Learning Algorithms for Mean Field Control Games (2405.17017v3)
Abstract: Mean Field Control Games (MFCG), introduced in [Angiuli et al., 2022a], represent competitive games between a large number of large collaborative groups of agents in the infinite limit of number and size of groups. In this paper, we prove the convergence of a three-timescale Reinforcement Q-Learning (RL) algorithm to solve MFCG in a model-free approach from the point of view of representative agents. Our analysis uses a Q-table for finite state and action spaces updated at each discrete time-step over an infinite horizon. In [Angiuli et al., 2023], we proved convergence of two-timescale algorithms for MFG and MFC separately highlighting the need to follow multiple population distributions in the MFC case. Here, we integrate this feature for MFCG as well as three rates of update decreasing to zero in the proper ratios. Our technique of proof uses a generalization to three timescales of the two-timescale analysis in [Borkar, 1997]. We give a simple example satisfying the various hypothesis made in the proof of convergence and illustrating the performance of the algorithm.
- Reinforcement learning algorithm for mixed mean field control games. arXiv:2205.02330.
- Unified reinforcement q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, 34:217–271.
- Convergence of multiscale reinforcement q-learning algorithms for mean field game and control problems. arXiv:2312.06659.
- Mean field games and mean field type control theory. Springer Briefs in Mathematics. Springer, New York.
- An analog scheme for fixed point computation. i. theory. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 44(4):351–355.
- Borkar, V. S. (1997). Stochastic approximation with two time scales. Systems & Control Letters, 29(5):291–294.
- Borkar, V. S. (1998). Asynchronous stochastic approximations. SIAM Journal on Control and Optimization, 36(3):840–851.
- The o.d.e. method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control and Optimization, 38(2):447–469.
- Probabilistic Theory of Mean Field Games with Applications I-II. Springer.
- Linear-quadratic mean-field reinforcement learning: Convergence of policy gradient methods. Preprint.
- Model-free mean-field reinforcement learning: mean-field mdp and mean-field q-learning. The Annals of Applied Probability, 33(6B):5334–5381.
- Approximately solving mean field games via entropy-regularized deep reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pages 1909–1917. PMLR.
- On the convergence of model free learning in mean field games. In in proc. of AAAI.
- Actor-critic learning for mean-field control in continuous time. arXiv preprint arXiv:2303.06993.
- Mean-field controls with q-learning for cooperative marl: convergence and complexity analysis. SIAM Journal on Mathematics of Data Science, 3(4):1168–1196.
- Learning mean-field games. In Advances in Neural Information Processing Systems, pages 4966–4976.
- Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Commun. Inf. Syst., 6(3):221–251.
- Actor-critic–type learning algorithms for markov decision processes. SIAM Journal on Control and Optimization, 38(1):94–123.
- Mean field games. Jpn. J. Math., 2(1):229–260.
- Learning mean field games: A survey. arXiv preprint arXiv:2205.12944.
- Mean-field markov decision processes with common noise and open-loop controls. arXiv preprint arXiv:1912.07883.
- Neveu, J. (1975). Discrete-parameter Martingales. North-Holland mathematical library. North-Holland.
- Sell, G. R. (1973). Differential equations without uniqueness and classical topological dynamics. Journal of Differential Equations, 14(1):42–56.
- Reinforcement learning in stationary mean-field games. In Proceedings. 18th International Conference on Autonomous Agents and Multiagent Systems.
- Reinforcement learning: An introduction. MIT press.
- Tembine, H. (2017). Mean-field-type games. AIMS Math, 2(4):706–735.
- Reinforcement learning in continuous time and space: A stochastic control approach. Journal of Machine Learning Research, 21(198):1–34.
- Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance, 30(4):1273–1308.
- Watkins, C. J. C. H. (1989). Learning from delayed rewards. PhD thesis, King’s College, Cambridge.