Gradient Flows for Regularized Stochastic Control Problems (2006.05956v5)
Abstract: This paper studies stochastic control problems with the action space taken to be probability measures, with the objective penalised by the relative entropy. We identify suitable metric space on which we construct a gradient flow for the measure-valued control process, in the set of admissible controls, along which the cost functional is guaranteed to decrease. It is shown that any invariant measure of this gradient flow satisfies the Pontryagin optimality principle. If the problem we work with is sufficiently convex, the gradient flow converges exponentially fast. Furthermore, the optimal measure-valued control process admits a Bayesian interpretation which means that one can incorporate prior knowledge when solving such stochastic control problems. This work is motivated by a desire to extend the theoretical underpinning for the convergence of stochastic gradient type algorithms widely employed in the reinforcement learning community to solve control problems.
- R. Bellman. Dynamic programming. Science, 153(3731):34–37, 1966.
- J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–393, 2000.
- A. Bensoussan. Stochastic control of partially observable systems. Cambridge University Press, 2004.
- A. Bensoussan and J.-L. Lions. Applications of variational inequalities in stochastic control. Elsevier, 2011.
- D. P. Bertsekas. Dynamic programming and optimal control. Athena scientific Belmont, MA, 1995.
- D. P. Bertsekas and S. Shreve. Stochastic optimal control: the discrete-time case. 2004.
- R. Carmona and F. Delarue. Probabilistic Theory of Mean Field Games with Applications I-II. Springer, 2018.
- A learning scheme by sparse grids and Picard approximations for semilinear parabolic PDEs. IMA Journal of Numerical Analysis, 43(5):3109–3168, 2023.
- Weak quantitative propagation of chaos via differential calculus on the space of measures. The Annals of Applied Probability, 32(3):1929–1969, 2022.
- F. Delarue and A. Tse. Uniform in time weak propagation of chaos on the torus. arXiv preprint arXiv:2104.14973, 2021.
- K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12:219–245, 2000.
- E. Emmrich and D. Šiška. Nonlinear stochastic evolution equations of second order with damping. Stoch PDE: Anal Comp, 5(1), 2017.
- Controlled Markov processes and viscosity solutions. Springer, 2006.
- A theory of regularized Markov decision processes. In International Conference on Machine Learning, pages 2160–2169. PMLR, 2019.
- E. Gobet and M. Grangereau. Newton method for stochastic control problems. 2021.
- J. Harter and A. Richou. A stability approach for solving multidimensional quadratic BSDEs. Electronic Journal of Probability, 4(24):1–51, 2019.
- Mean-field Langevin system, optimal control and deep neural networks. arXiv:1909.07278, 2019.
- Mean-field Langevin dynamics and energy landscape of neural networks. Annales de l’Institut Henri Poincare (B) Probabilites et statistiques, 57(4):2043–2065, 2021.
- Convergence of policy improvement for entropy-regularized stochastic control problems. arXiv preprint arXiv:2209.07059, 2022.
- A neural network-based policy iteration algorithm with global H2superscript𝐻2H^{2}italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-superlinear convergence for stochastic games on domains. Foundations of Computational Mathematics, 21:331–374, 2021.
- Mean-field neural ODEs via relaxed optimal control. arXiv:1912.05475, 2019.
- N. Kazamaki. Continuous Exponential Martingales and BMO. Springer-Verlag Berlin Heidelberg, 1994.
- A modified MSA for stochastic control problems. Applied Mathematics & Optimization, pages 1–20, 2021.
- Mirror descent for stochastic control problems with measure-valued controls. arXiv preprint arXiv:2401.01198, 2024.
- Exponential convergence and stability of Howard’s policy improvement algorithm for controlled diffusions. SIAM Journal on Control and Optimization, 58(3):1314–1340, 2020.
- T. Komorowski and A. Walczuk. Central limit theorem for Markov processes with spectral gap in the Wasserstein metric. Stochastic Processes and their Applications, 122(5):2155–2184, 2012.
- N. V. Krylov. Controlled diffusion processes. Springer, 1980. Translated from the Russian by A. B. Aries.
- Stochastic evolution equations. Journal of Soviet Mathematics, (14):1233–1277, 1981.
- Linear and quasi-linear equations of parabolic type. Translations of Mathematical Monographs. AMS, 1968.
- M. B. Majka. Coupling and exponential ergodicity for stochastic differential equations driven by Lévy processes. Stochastic Processes and their Applications, 127(12):4083–4125, 2017.
- F. Otto and C. Villani. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. Journal of Functional Analysis, 173:361–400, 2000.
- C. Reisinger and Y. Zhang. Regularity and stability of feedback relaxed controls. SIAM Journal on Control and Optimization, 59(5):3118–3151, 2021.
- Reinforcement learning: An introduction. MIT press, 2018.
- Ł. Szpruch and A. Tse. Antithetic multilevel particle system sampling method for McKean–Vlasov sdes. arXiv preprint arXiv:1903.07063, 2019.
- Exploratory HJB equations and their convergence. SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022.
- C. Villani. Optimal transport: old and new. Springer, 2008.
- Exploration versus exploitation in reinforcement learning: a stochastic control approach. Available at SSRN 3316387, 2019.
- L. C. Young. Lectures on the calculus of variations and optimal control theory, volume 304. American Mathematical Soc., 2000.
- J. Zhang. Backward stochastic differential equations. Springer, 2017.
- B. D. Ziebart. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University, 2010.