2000 character limit reached
Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient (2302.13144v3)
Published 25 Feb 2023 in math.OC, cs.AI, cs.LG, cs.SY, and eess.SY
Abstract: We revisit in this paper the discrete-time linear quadratic regulator (LQR) problem from the perspective of receding-horizon policy gradient (RHPG), a newly developed model-free learning framework for control applications. We provide a fine-grained sample complexity analysis for RHPG to learn a control policy that is both stabilizing and $\epsilon$-close to the optimal LQR solution, and our algorithm does not require knowing a stabilizing control policy for initialization. Combined with the recent application of RHPG in learning the Kalman filter, we demonstrate the general applicability of RHPG in linear control and estimation with streamlined analyses.
- Optimality and approximation with policy gradient methods in Markov decision processes. In Conference on Learning Theory, pages 64–66, 2020.
- Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58(6):3586–3612, 2020.
- Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476, 2018.
- Derivative-free methods for policy optimization: Guarantees for linear quadratic systems. Journal of Machine Learning Research, 21(21):1–51, 2020.
- Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach. arXiv preprint arXiv:1912.09135, 2019.
- Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. arXiv preprint arXiv:2011.10300, 2020.
- Derivative-free policy optimization for linear risk-sensitive and robust control design: Implicit regularization and sample complexity. Advances in Neural Information Processing Systems, pages 2949–2964, 2021.
- Stabilizing dynamical systems via policy gradient methods. Advances in Neural Information Processing Systems, 34, 2021.
- Toward a theoretical foundation of policy optimization for learning control policies. Annual Review of Control, Robotics, and Autonomous Systems, 6:123–158, 2023.
- Analysis of the optimization landscape of linear quadratic Gaussian (LQG) control. arXiv preprint arXiv:2102.04393, 2021.
- Optimal Filtering. Prentice-Hall, 1979.
- Optimal Control: Linear Quadratic Methods. Prentice-Hall, Inc., 1990.
- H-infinity Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach. Birkhäuser, Boston, 1995.
- Learning the Kalman filter with fine-grained sample complexity. In American Control Conference, pages 4549–4554, 2023.
- Andrew Lamperski. Computing stabilizing linear controllers via policy iteration. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 1902–1907. IEEE, 2020.
- Learning stabilizing controllers of linear systems via discount policy gradient. arXiv preprint arXiv:2112.09294, 2021.
- Global convergence of receding-horizon policy search in learning estimator designs. arXiv preprint arXiv:2309.04831, 2023.
- LQR through the lens of first order methods: Discrete-time case. arXiv preprint arXiv:1907.08921, 2019.
- Indefinite-Quadratic Estimation and Control: A Unified Approach to H2subscript𝐻2H_{2}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and H∞subscript𝐻H_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT Theories. SIAM, 1999.
- Learning the globally optimal distributed LQ regulator. In Learning for Dynamics and Control, pages 287–297, 2020.
- Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. The Journal of Machine Learning Research, 18(1):1703–1713, 2017.