Papers
Topics
Authors
Recent
Search
2000 character limit reached

Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR

Published 25 Jan 2024 in math.OC and cs.LG | (2401.14534v2)

Abstract: We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Sharp-maml: Sharpness-aware model-agnostic meta learning. In International conference on machine learning, pages 10–32. PMLR, 2022.
  2. K. Balasubramanian and S. Ghadimi. Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points. Foundations of Computational Mathematics, pages 1–42, 2022.
  3. A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
  4. LQR through the lens of first order methods: Discrete-time case. arXiv preprint arXiv:1907.08921, 2019.
  5. Multi-Task System Identification of Similar Linear Time-Invariant Dynamical Systems. arXiv preprint arXiv:2301.01430, 2023.
  6. R⁢L2𝑅superscript𝐿2RL^{2}italic_R italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
  7. On the convergence theory of gradient-based model-agnostic meta-learning algorithms. In International Conference on Artificial Intelligence and Statistics, pages 1082–1092. PMLR, 2020.
  8. On the convergence theory of debiased model-agnostic meta-reinforcement learning. Advances in Neural Information Processing Systems, 34:3096–3107, 2021.
  9. Global convergence of policy gradient methods for the linear quadratic regulator. In International conference on machine learning, pages 1467–1476. PMLR, 2018.
  10. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  11. Online meta-learning. In International Conference on Machine Learning, pages 1920–1930. PMLR, 2019.
  12. Learning optimal controllers for linear systems with multiplicative noise via policy gradient. IEEE Transactions on Automatic Control, 66(11):5283–5298, 2020.
  13. Introduction to Matrix Methods. In Lecture notes, 2021.
  14. Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies. Annual Review of Control, Robotics, and Autonomous Systems, 6:123–158, 2023.
  15. Theoretical convergence of multi-step model-agnostic meta-learning. The Journal of Machine Learning Research, 23(1):1317–1357, 2022.
  16. T. T. Johnson and S. Mitra. Safe flocking in spite of actuator faults using directional failure detectors. Journal of Nonlinear Systems and Applications, 2(1-2):73–95, 2011.
  17. A theoretical understanding of gradient bias in meta-reinforcement learning. Advances in Neural Information Processing Systems, 35:31059–31072, 2022.
  18. Derivative-free methods for policy optimization: Guarantees for linear quadratic systems. In The 22nd international conference on artificial intelligence and statistics, pages 2916–2925. PMLR, 2019.
  19. Global exponential convergence of gradient methods over the nonconvex landscape of the linear quadratic regulator. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 7474–7479. IEEE, 2019.
  20. On the linear convergence of random search for discrete-time LQR. IEEE Control Systems Letters, 5(3):989–994, 2020.
  21. I. Molybog and J. Lavaei. When does maml objective have benign landscape? In 2021 IEEE Conference on Control Technology and Applications (CCTA), pages 220–227. IEEE, 2021.
  22. N. Musavi and G. E. Dullerud. Convergence of Gradient-based MAML in LQR. arXiv preprint arXiv:2309.06588, 2023.
  23. Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
  24. Computing stabilizing feedback gains via a model-free policy gradient method. IEEE Control Systems Letters, 7:407–412, 2022.
  25. Stabilizing dynamical systems via policy gradient methods. Advances in neural information processing systems, 34:29274–29286, 2021.
  26. Promp: Proximal meta-policy search. arXiv preprint arXiv:1810.06784, 2018.
  27. C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory, volume 6, pages 583–603. University of California Press, 1972.
  28. Zeroth-order feedback optimization for cooperative multi-agent systems. Automatica, 148:110741, 2023.
  29. Learning Personalized Models with Clustered System Identification. arXiv preprint arXiv:2304.01395, 2023a.
  30. Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach. arXiv preprint arXiv:2309.10679, 2023b.
  31. J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12:389–434, 2012.
  32. Fedsysid: A federated approach to sample-efficient system identification. In Learning for Dynamics and Control Conference, pages 1308–1320. PMLR, 2023a.
  33. Model-free Learning with Heterogeneous Dynamical Systems: A Federated LQR Approach. arXiv preprint arXiv:2308.11743, 2023b.
  34. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
  35. Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use. arXiv preprint arXiv:2310.01362, 2023c.
  36. Multi-task imitation learning for linear dynamical systems. In Learning for Dynamics and Control Conference, pages 586–599. PMLR, 2023a.
  37. Meta-Learning Operators to Optimality from Multi-Task Non-IID Data. arXiv preprint arXiv:2308.04428, 2023b.
Citations (6)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 2 likes about this paper.