Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR (2402.14483v1)

Published 22 Feb 2024 in eess.SY and cs.SY

Abstract: This article introduces a novel framework for data-driven linear quadratic regulator (LQR) design. First, we introduce a reinforcement learning paradigm for on-policy data-driven LQR, where exploration and exploitation are simultaneously performed while guaranteeing robust stability of the whole closed-loop system encompassing the plant and the control/learning dynamics. Then, we propose Model Reference Adaptive Reinforcement Learning (MR-ARL), a control architecture integrating tools from reinforcement learning and model reference adaptive control. The approach stands on a variable reference model containing the currently identified value function. Then, an adaptive stabilizer is used to ensure convergence of the applied policy to the optimal one, convergence of the plant to the optimal reference model, and overall robust closed-loop stability. The proposed framework provides theoretical robustness certificates against real-world perturbations such as measurement noise, plant nonlinearities, or slowly varying parameters. The effectiveness of the proposed architecture is validated via realistic numerical simulations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. M. Borghesi, A. Bosso, and G. Notarstefano, “On-policy data-driven linear quadratic regulator via model reference adaptive reinforcement learning,” in 2023 62nd IEEE Conference on Decision and Control (CDC).   IEEE, 2023, pp. 32–37.
  2. B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2017.
  3. B. Recht, “A tour of reinforcement learning: The view from continuous control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019.
  4. A. M. Annaswamy and A. L. Fradkov, “A historical perspective of adaptive control and learning,” Annual Reviews in Control, vol. 52, pp. 18–41, 2021.
  5. C. J. C. H. Watkins, “Learning from delayed rewards,” 1989.
  6. Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012.
  7. H. Modares, F. L. Lewis, and Z.-P. Jiang, “Optimal output-feedback control of unknown continuous-time linear systems using off-policy reinforcement learning,” IEEE Transactions on Cybernetics, vol. 46, no. 11, pp. 2401–2410, 2016.
  8. B. Pang, T. Bian, and Z.-P. Jiang, “Data-driven finite-horizon optimal control for linear time-varying discrete-time systems,” in 2018 IEEE Conference on Decision and Control (CDC).   IEEE, 2018, pp. 861–866.
  9. K. Krauth, S. Tu, and B. Recht, “Finite-time analysis of approximate policy iteration for the linear quadratic regulator,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  10. B. Pang, T. Bian, and Z.-P. Jiang, “Robust policy iteration for continuous-time linear quadratic regulation,” IEEE Transactions on Automatic Control, vol. 67, no. 1, pp. 504–511, 2021.
  11. V. G. Lopez, M. Alsalti, and M. A. Müller, “Efficient off-policy Q-learning for data-based discrete-time LQR problems,” IEEE Transactions on Automatic Control, 2023.
  12. I. Ziemann, A. Tsiamis, H. Sandberg, and N. Matni, “How are policy gradient methods affected by the limits of control?” in 2022 IEEE 61st Conference on Decision and Control (CDC).   IEEE, 2022, pp. 5992–5999.
  13. T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, 2016.
  14. F. Dörfler, P. Tesi, and C. De Persis, “On the role of regularization in direct data-driven LQR control,” in 2022 IEEE 61st Conference on Decision and Control (CDC).   IEEE, 2022, pp. 1091–1098.
  15. F. Celi, G. Baggio, and F. Pasqualetti, “Closed-form estimates of the LQR gain from finite data,” in 2022 IEEE 61st Conference on Decision and Control (CDC), 2022, pp. 4016–4021.
  16. C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019.
  17. G. R. G. da Silva, A. S. Bazanella, C. Lorenzini, and L. Campestrini, “Data-driven LQR control design,” IEEE control systems letters, vol. 3, no. 1, pp. 180–185, 2018.
  18. C. De Persis and P. Tesi, “Low-complexity learning of linear quadratic regulators from noisy data,” Automatica, vol. 128, p. 109548, 2021.
  19. M. Rotulo, C. De Persis, and P. Tesi, “Data-driven linear quadratic regulation via semidefinite programming,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 3995–4000, 2020.
  20. S. Dean, S. Tu, N. Matni, and B. Recht, “Safely learning to control the constrained linear quadratic regulator,” in 2019 American Control Conference (ACC).   IEEE, 2019, pp. 5582–5588.
  21. S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” in Proceedings of 1994 American Control Conference-ACC’94, vol. 3.   IEEE, 1994, pp. 3475–3479.
  22. M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International conference on machine learning.   PMLR, 2018, pp. 1467–1476.
  23. D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009.
  24. H. Modares and F. L. Lewis, “Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning,” IEEE Transactions on Automatic Control, vol. 59, no. 11, pp. 3051–3056, 2014.
  25. S. A. A. Rizvi and Z. Lin, “Output feedback reinforcement Q-learning control for the discrete-time linear quadratic regulator problem,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC).   IEEE, 2017, pp. 1311–1316.
  26. ——, “Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,” IEEE Transactions on Cybernetics, vol. 50, no. 11, pp. 4670–4679, 2019.
  27. B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking control of unknown discrete-time linear systems using input-output measured data,” IEEE Transactions on Cybernetics, vol. 45, no. 12, pp. 2770–2779, 2015.
  28. C. Possieri and M. Sassano, “Q-Learning for continuous-time linear systems: A data-driven implementation of the Kleinman algorithm,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 10, pp. 6487–6497, 2022.
  29. ——, “Value iteration for continuous-time linear time-invariant systems,” IEEE Transactions on Automatic Control, 2022.
  30. G. Tao, “Multivariable adaptive control: A survey,” Automatica, vol. 50, no. 11, pp. 2737–2764, 2014.
  31. A. Guha and A. M. Annaswamy, “Online policies for real-time control using MRAC-RL,” in 2021 60th IEEE Conference on Decision and Control (CDC).   IEEE, 2021, pp. 1808–1813.
  32. E. Panteley, A. Loria, and A. Teel, “Relaxed persistency of excitation for uniform asymptotic stability,” IEEE Transactions on Automatic Control, vol. 46, no. 12, pp. 1874–1886, 2001.
  33. V. Kučera, “A review of the matrix Riccati equation,” Kybernetika, vol. 9, no. 1, pp. 42–61, 1973.
  34. R. S. Bucy, “Global theory of the Riccati equation,” Journal of computer and system sciences, vol. 1, no. 4, pp. 349–361, 1967.
  35. L. Menini, C. Possieri, and A. Tornambè, “Algebraic analysis of the structural properties of parametric linear time-invariant systems,” IET Control Theory & Applications, vol. 14, no. 20, pp. 3568–3579, 2020.
  36. A. C. Ran and L. Rodman, “On parameter dependence of solutions of algebraic Riccati equations,” Mathematics of Control, Signals and Systems, vol. 1, pp. 269–284, 1988.
  37. A. R. Teel, L. Moreau, and D. Nesic, “A unified framework for input-to-state stability in systems with two time scales,” IEEE Transactions on Automatic Control, vol. 48, no. 9, pp. 1526–1544, 2003.
  38. S. Sastry, M. Bodson, and J. F. Bartram, “Adaptive control: stability, convergence, and robustness,” 1990.
  39. I. M. Mareels and M. Gevers, “Persistency of excitation criteria for linear, multivariable, time-varying systems,” Mathematics of Control, Signals and Systems, vol. 1, pp. 203–226, 1988.

Summary

We haven't generated a summary for this paper yet.