Policy Iteration for Exploratory Hamilton--Jacobi--Bellman Equations (2406.00612v4)
Abstract: We study the policy iteration algorithm (PIA) for entropy-regularized stochastic control problems on an infinite time horizon with a large discount rate, focusing on two main scenarios. First, we analyze PIA with bounded coefficients where the controls applied to the diffusion term satisfy a smallness condition. We demonstrate the convergence of PIA based on a uniform $\mathcal{C}{2,\alpha}$ estimate for the value sequence generated by PIA, and provide a quantitative convergence analysis for this scenario. Second, we investigate PIA with unbounded coefficients but no control over the diffusion term. In this scenario, we first provide the well-posedness of the exploratory Hamilton--Jacobi--BeLLMan equation with linear growth coefficients and polynomial growth reward function. By such a well-posedess result we achieve PIA's convergence by establishing a quantitative locally uniform $\mathcal{C}{1,\alpha}$ estimates for the generated value sequence.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.