Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR (2401.14871v4)

Published 26 Jan 2024 in math.OC, cs.SY, and eess.SY

Abstract: Direct data-driven design methods for the linear quadratic regulator (LQR) mainly use offline or episodic data batches, and their online adaptation has been acknowledged as an open problem. In this paper, we propose a direct adaptive method to learn the LQR from online closed-loop data. First, we propose a new policy parameterization based on the sample covariance to formulate a direct data-driven LQR problem, which is shown to be equivalent to the certainty-equivalence LQR with optimal non-asymptotic guarantees. Second, we design a novel data-enabled policy optimization (DeePO) method to directly update the policy, where the gradient is explicitly computed using only a batch of persistently exciting (PE) data. Third, we establish its global convergence via a projected gradient dominance property. Importantly, we efficiently use DeePO to adaptively learn the LQR by performing only one-step projected gradient descent per sample of the closed-loop system, which also leads to an explicit recursive update of the policy. Under PE inputs and for bounded noise, we show that the average regret of the LQR cost is upper-bounded by two terms signifying a sublinear decrease in time $\mathcal{O}(1/\sqrt{T})$ plus a bias scaling inversely with signal-to-noise ratio (SNR), which are independent of the noise statistics. Finally, we perform simulations to validate the theoretical results and demonstrate the computational and sample efficiency of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Y. Abbasi-Yadkori and C. Szepesvári, “Regret bounds for the adaptive control of linear quadratic systems,” in Proceedings of the 24th Annual Conference on Learning Theory.   JMLR Workshop and Conference Proceedings, 2011, pp. 1–26.
  2. S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample complexity of the linear quadratic regulator,” Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020.
  3. H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,” in Advances in Neural Information Processing Systems, vol. 32.   Curran Associates, Inc., 2019.
  4. M. Simchowitz and D. Foster, “Naive exploration is optimal for online LQR,” in International Conference on Machine Learning.   PMLR, 2020, pp. 8937–8948.
  5. M. C. Campi and P. Kumar, “Adaptive linear quadratic gaussian control: the cost-biased approach revisited,” SIAM Journal on Control and Optimization, vol. 36, no. 6, pp. 1890–1907, 1998.
  6. F. Wang and L. Janson, “Exact asymptotics for linear quadratic adaptive control,” The Journal of Machine Learning Research, vol. 22, no. 1, pp. 12 136–12 247, 2021.
  7. Y. Lu and Y. Mo, “Almost surely T𝑇\sqrt{T}square-root start_ARG italic_T end_ARG regret bound for adaptive LQR,” arXiv preprint arXiv:2301.05537, 2023.
  8. F. Celi, G. Baggio, and F. Pasqualetti, “Closed-form and robust expressions for data-driven LQ control,” Annual Reviews in Control, vol. 56, p. 100916, 2023.
  9. C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019.
  10. ——, “Low-complexity learning of linear quadratic regulators from noisy data,” Automatica, vol. 128, p. 109548, 2021.
  11. F. Dörfler, P. Tesi, and C. De Persis, “On the certainty-equivalence approach to direct data-driven lqr design,” IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7989–7996, 2023.
  12. ——, “On the role of regularization in direct data-driven LQR control,” in 61st IEEE Conference on Decision and Control (CDC), 2022, pp. 1091–1098.
  13. I. Markovsky, L. Huang, and F. Dörfler, “Data-driven control based on the behavioral approach: From theory to applications in power systems,” IEEE Control Systems Magazine, vol. 43, no. 5, pp. 28–68, 2023.
  14. J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
  15. I. Markovsky and F. Dörfler, “Behavioral systems theory in data-driven analysis, signal processing, and control,” Annual Reviews in Control, vol. 52, pp. 42–64, 2021.
  16. J. Coulson, J. Lygeros, and F. Dörfler, “Data-enabled predictive control: In the shallows of the DeePC,” in 18th European Control Conference (ECC), 2019, pp. 307–312.
  17. V. Breschi, A. Chiuso, and S. Formentin, “Data-driven predictive control in a stochastic setting: a unified framework,” Automatica, vol. 152, p. 110961, 2023.
  18. A. Chiuso, M. Fabris, V. Breschi, and S. Formentin, “Harnessing the final control error for optimal data-driven predictive control,” arXiv preprint arXiv:2312.14788, 2023.
  19. H. J. van Waarde, M. K. Camlibel, and M. Mesbahi, “From noisy data to feedback controllers: Nonconservative design via a matrix S-lemma,” IEEE Transactions on Automatic Control, vol. 67, no. 1, pp. 162–175, 2020.
  20. J. Berberich, C. W. Scherer, and F. Allgöwer, “Combining prior knowledge and data for robust controller design,” IEEE Transactions on Automatic Control, vol. 68, no. 8, pp. 4618–4633, 2023.
  21. A. M. Annaswamy and A. L. Fradkov, “A historical perspective of adaptive control and learning,” Annual Reviews in Control, vol. 52, pp. 18–41, 2021.
  22. P. Makila and H. Toivonen, “Computational methods for parametric LQ problems – A survey,” IEEE Transactions on Automatic Control, vol. 32, no. 8, pp. 658–671, 1987.
  23. R. E. Kalman et al., “Contributions to the theory of optimal control,” Bol. soc. mat. mexicana, vol. 5, no. 2, pp. 102–119, 1960.
  24. K. Mårtensson and A. Rantzer, “Gradient methods for iterative distributed control synthesis,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, 2009, pp. 549–554.
  25. M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning, 2018, pp. 1467–1476.
  26. H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanović, “Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem,” IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2022.
  27. D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. Bartlett, and M. Wainwright, “Derivative-free methods for policy optimization: Guarantees for linear quadratic systems,” in 22nd International Conference on Artificial Intelligence and Statistics, 2019, pp. 2916–2925.
  28. F. Zhao, K. You, and T. Başar, “Global convergence of policy gradient primal-dual methods for risk-constrained LQRs,” IEEE Transactions on Automatic Control, vol. 68, no. 5, pp. 2934–2949, 2023.
  29. F. Zhao, X. Fu, and K. You, “Convergence and sample complexity of policy gradient methods for stabilizing linear systems,” arXiv preprint arXiv:2205.14335, 2022.
  30. B. Hu, K. Zhang, N. Li, M. Mesbahi, M. Fazel, and T. Başar, “Toward a theoretical foundation of policy optimization for learning control policies,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 6, pp. 123–158, 2023.
  31. F. Zhao, F. Dörfler, and K. You, “Data-enabled policy optimization for the linear quadratic regulator,” in 62nd IEEE Conference on Decision and Control (CDC), 2023, pp. 6160–6165.
  32. A. Rantzer, “Minimax adaptive control for a finite set of linear systems,” in Learning for Dynamics and Control.   PMLR, 2021, pp. 893–904.
  33. B. Song and A. Iannelli, “The role of identification in data-driven policy iteration: A system theoretic study,” arXiv preprint arXiv:2401.06721, 2024.
  34. E. Hazan et al., “Introduction to online convex optimization,” Foundations and Trends® in Optimization, vol. 2, no. 3-4, pp. 157–325, 2016.
  35. H. J. Van Waarde, J. Eising, H. L. Trentelman, and M. K. Camlibel, “Data informativity: a new perspective on data-driven analysis and control,” IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4753–4768, 2020.
  36. S. Kang and K. You, “Minimum input design for direct data-driven property identification of unknown linear systems,” Automatica, vol. 156, p. 111130, 2023.
  37. Y. Sun and M. Fazel, “Analysis of policy gradient descent for control: Global optimality via convex parameterization,” [Online], Available at https://github.com/sunyue93/Nonconvex-optimization-meets-control/blob/master/convexify.pdf.
  38. J. Coulson, H. J. Van Waarde, J. Lygeros, and F. Dörfler, “A quantitative notion of persistency of excitation and the robust fundamental lemma,” IEEE Control Systems Letters, vol. 7, pp. 1243–1248, 2022.
  39. J. Sherman and W. J. Morrison, “Adjustment of an inverse matrix corresponding to a change in one element of a given matrix,” The Annals of Mathematical Statistics, vol. 21, no. 1, pp. 124–127, 1950.
  40. S. Tu and B. Recht, “The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint,” in Conference on Learning Theory, 2019, pp. 3036–3083.
Citations (12)

Summary

We haven't generated a summary for this paper yet.