Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR
Abstract: We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.
- Sharp-maml: Sharpness-aware model-agnostic meta learning. In International conference on machine learning, pages 10–32. PMLR, 2022.
- K. Balasubramanian and S. Ghadimi. Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points. Foundations of Computational Mathematics, pages 1–42, 2022.
- A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
- LQR through the lens of first order methods: Discrete-time case. arXiv preprint arXiv:1907.08921, 2019.
- Multi-Task System Identification of Similar Linear Time-Invariant Dynamical Systems. arXiv preprint arXiv:2301.01430, 2023.
- RL2𝑅superscript𝐿2RL^{2}italic_R italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
- On the convergence theory of gradient-based model-agnostic meta-learning algorithms. In International Conference on Artificial Intelligence and Statistics, pages 1082–1092. PMLR, 2020.
- On the convergence theory of debiased model-agnostic meta-reinforcement learning. Advances in Neural Information Processing Systems, 34:3096–3107, 2021.
- Global convergence of policy gradient methods for the linear quadratic regulator. In International conference on machine learning, pages 1467–1476. PMLR, 2018.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
- Online meta-learning. In International Conference on Machine Learning, pages 1920–1930. PMLR, 2019.
- Learning optimal controllers for linear systems with multiplicative noise via policy gradient. IEEE Transactions on Automatic Control, 66(11):5283–5298, 2020.
- Introduction to Matrix Methods. In Lecture notes, 2021.
- Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies. Annual Review of Control, Robotics, and Autonomous Systems, 6:123–158, 2023.
- Theoretical convergence of multi-step model-agnostic meta-learning. The Journal of Machine Learning Research, 23(1):1317–1357, 2022.
- T. T. Johnson and S. Mitra. Safe flocking in spite of actuator faults using directional failure detectors. Journal of Nonlinear Systems and Applications, 2(1-2):73–95, 2011.
- A theoretical understanding of gradient bias in meta-reinforcement learning. Advances in Neural Information Processing Systems, 35:31059–31072, 2022.
- Derivative-free methods for policy optimization: Guarantees for linear quadratic systems. In The 22nd international conference on artificial intelligence and statistics, pages 2916–2925. PMLR, 2019.
- Global exponential convergence of gradient methods over the nonconvex landscape of the linear quadratic regulator. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 7474–7479. IEEE, 2019.
- On the linear convergence of random search for discrete-time LQR. IEEE Control Systems Letters, 5(3):989–994, 2020.
- I. Molybog and J. Lavaei. When does maml objective have benign landscape? In 2021 IEEE Conference on Control Technology and Applications (CCTA), pages 220–227. IEEE, 2021.
- N. Musavi and G. E. Dullerud. Convergence of Gradient-based MAML in LQR. arXiv preprint arXiv:2309.06588, 2023.
- Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
- Computing stabilizing feedback gains via a model-free policy gradient method. IEEE Control Systems Letters, 7:407–412, 2022.
- Stabilizing dynamical systems via policy gradient methods. Advances in neural information processing systems, 34:29274–29286, 2021.
- Promp: Proximal meta-policy search. arXiv preprint arXiv:1810.06784, 2018.
- C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory, volume 6, pages 583–603. University of California Press, 1972.
- Zeroth-order feedback optimization for cooperative multi-agent systems. Automatica, 148:110741, 2023.
- Learning Personalized Models with Clustered System Identification. arXiv preprint arXiv:2304.01395, 2023a.
- Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach. arXiv preprint arXiv:2309.10679, 2023b.
- J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12:389–434, 2012.
- Fedsysid: A federated approach to sample-efficient system identification. In Learning for Dynamics and Control Conference, pages 1308–1320. PMLR, 2023a.
- Model-free Learning with Heterogeneous Dynamical Systems: A Federated LQR Approach. arXiv preprint arXiv:2308.11743, 2023b.
- Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
- Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use. arXiv preprint arXiv:2310.01362, 2023c.
- Multi-task imitation learning for linear dynamical systems. In Learning for Dynamics and Control Conference, pages 586–599. PMLR, 2023a.
- Meta-Learning Operators to Optimality from Multi-Task Non-IID Data. arXiv preprint arXiv:2308.04428, 2023b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.