A homotopic approach to policy gradients for linear quadratic regulators with nonlinear controls
Abstract: We study the convergence of deterministic policy gradient algorithms in continuous state and action space for the prototypical Linear Quadratic Regulator (LQR) problem when the search space is not limited to the family of linear policies. We first provide a counterexample showing that extending the policy class to piecewise linear functions results in local minima of the policy gradient algorithm. To solve this problem, we develop a new approach that involves sequentially increasing a discount factor between iterations of the original policy gradient algorithm. We finally prove that this homotopic variant of policy gradient methods converges to the global optimum of the undiscounted Linear Quadratic Regulator problem for a large class of Lipschitz, non-linear policies.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.