Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Two-step reinforcement learning for model-free redesign of nonlinear optimal regulator (2103.03808v4)

Published 5 Mar 2021 in eess.SY, cs.LG, and cs.SY

Abstract: In many practical control applications, the performance level of a closed-loop system degrades over time due to the change of plant characteristics. Thus, there is a strong need for redesigning a controller without going through the system modeling process, which is often difficult for closed-loop systems. Reinforcement learning (RL) is one of the promising approaches that enable model-free redesign of optimal controllers for nonlinear dynamical systems based only on the measurement of the closed-loop system. However, the learning process of RL usually requires a considerable number of trial-and-error experiments using the poorly controlled system that may accumulate wear on the plant. To overcome this limitation, we propose a model-free two-step design approach that improves the transient learning performance of RL in an optimal regulator redesign problem for unknown nonlinear systems. Specifically, we first design a linear control law that attains some degree of control performance in a model-free manner, and then, train the nonlinear optimal control law with online RL by using the designed linear control law in parallel. We introduce an offline RL algorithm for the design of the linear control law and theoretically guarantee its convergence to the LQR controller under mild assumptions. Numerical simulations show that the proposed approach improves the transient learning performance and efficiency in hyperparameter tuning of RL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. Hou Z, Xiong S. On model-free adaptive control and its stability analysis. IEEE Transactions on Automatic Control. 2019;64(11):4555–4569. DOI: 10.1109/TAC.2019.2894586.
  2. Kaneko O. Data-driven controller tuning: FRIT approach. In: Proc. 11th IFAC International Workshop on Adaptation and Learning in Control and Signal Processing; 2013. p. 326–336. DOI: 10.3182/20130703-3-FR-4038.00122.
  3. Sutton RS, Barto AG. Reinforcement learning: An introduction. 2nd ed. MIT Press; 2018.
  4. Doya K. Reinforcement learning in continuous time and space. Neural Computation. 2000;12(1):219–245. DOI: 10.1162/089976600300015961.
  5. Vamvoudakis KG, Lewis FL. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica. 2010;46(5):878–888. DOI: 10.1109/TSMCC.2002.801727.
  6. Zanon M, Gros S. Safe reinforcement learning using robust MPC. IEEE Transactions on Automatic Control. 2021;66(8):3638–3652. DOI: 10.1109/TAC.2020.3024161.
  7. Hewer GA. An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Transactions on Automatic Control. 1971;16(4):382–384. DOI: 10.1109/TAC.1971.1099755.
  8. Horn RA, Johnson CR. Topics in matrix analysis. 1st ed. Cambridge University Press; 1994.
  9. Jiang Y, Jiang ZP. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica. 2012;48(10):2699–2704. DOI: 10.1016/j.automatica.2012.06.096.
  10. Bian T, Jiang ZP. Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica. 2016;71:348–360. DOI: 10.1016/j.automatica.2016.05.003.
  11. Rizvi SAA, Lin Z. Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback. IEEE Transactions on Cybernetics. 2020;50(11):4670–4679. DOI: 10.1109/TCYB.2018.2886735.
  12. Lewis FL, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine. 2009;9(3):32–50. DOI: 10.1109/MCAS.2009.933854.
  13. Lee D, Hu J. Primal-dual Q-learning framework for LQR design. IEEE Transactions on Automatic Control. 2019;64(9):3756–3763. DOI: 10.1109/TAC.2018.2884649.
Citations (1)

Summary

We haven't generated a summary for this paper yet.