Regret Analysis of Policy Optimization over Submanifolds for Linearly Constrained Online LQG (2403.08553v1)
Abstract: Recent advancement in online optimization and control has provided novel tools to study online linear quadratic regulator (LQR) problems, where cost matrices are varying adversarially over time. However, the controller parameterization of existing works may not satisfy practical conditions like sparsity due to physical connections. In this work, we study online linear quadratic Gaussian problems with a given linear constraint imposed on the controller. Inspired by the recent work of [1] which proposed, for a linearly constrained policy optimization of an offline LQR, a second order method equipped with a Riemannian metric that emerges naturally in the context of optimal control problems, we propose online optimistic Newton on manifold (OONM) which provides an online controller based on the prediction on the first and second order information of the function sequence. To quantify the proposed algorithm, we leverage the notion of regret defined as the sub-optimality of its cumulative cost to that of a (locally) minimizing controller sequence and provide the regret bound in terms of the path-length of the minimizer sequence. Simulation results are also provided to verify the property of OONM.
- S. Talebi and M. Mesbahi, “Policy optimization over submanifolds for linearly constrained feedback synthesis,” IEEE Transactions on Automatic Control, 2023.
- B. D. O. Anderson, J. B. Moore, and B. P. Molinari, “Linear optimal control,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-2, no. 4, pp. 559–559, 1972.
- A. Cohen, A. Hasidim, T. Koren, N. Lazic, Y. Mansour, and K. Talwar, “Online linear quadratic control,” in International Conference on Machine Learning, 2018, pp. 1029–1038.
- N. Agarwal, B. Bullins, E. Hazan, S. M. Kakade, and K. Singh, “Online control with adversarial disturbances,” in 36th International Conference on Machine Learning, ICML 2019. International Machine Learning Society (IMLS), 2019, pp. 154–165.
- N. Agarwal, E. Hazan, and K. Singh, “Logarithmic regret for online control,” in Advances in Neural Information Processing Systems, 2019, pp. 10 175–10 184.
- M. Simchowitz, K. Singh, and E. Hazan, “Improper learning for non-stochastic control,” arXiv preprint arXiv:2001.09254, 2020.
- T.-J. Chang and S. Shahrampour, “Distributed online linear quadratic control for linear time-invariant systems,” in American Control Conference (ACC), 2021, pp. 923–928.
- M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning, 2018, pp. 1467–1476.
- J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi, “Lqr through the lens of first order methods: Discrete-time case,” arXiv preprint arXiv:1907.08921, 2019.
- J. Bu, A. Mesbahi, and M. Mesbahi, “Policy gradient-based algorithms for continuous-time linear quadratic control,” arXiv preprint arXiv:2006.09178, 2020.
- H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanović, “Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,” IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021.
- M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” in Proceedings of the 20th international conference on machine learning (icml-03), 2003, pp. 928–936.
- E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 69, no. 2-3, pp. 169–192, 2007.
- L. Zhang, S. Lu, and Z.-H. Zhou, “Adaptive online learning in dynamic environments,” Advances in neural information processing systems, vol. 31, 2018.
- A. Mokhtari, S. Shahrampour, A. Jadbabaie, and A. Ribeiro, “Online optimization in dynamic environments: Improved regret rates for strongly convex problems,” in 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016, pp. 7195–7201.
- O. Besbes, Y. Gur, and A. Zeevi, “Non-stationary stochastic optimization,” Operations research, vol. 63, no. 5, pp. 1227–1244, 2015.
- C.-K. Chiang, T. Yang, C.-J. Lee, M. Mahdavi, C.-J. Lu, R. Jin, and S. Zhu, “Online optimization with gradual variations,” in Conference on Learning Theory. JMLR Workshop and Conference Proceedings, 2012, pp. 6–1.
- S. Rakhlin and K. Sridharan, “Optimization, learning, and games with predictable sequences,” Advances in Neural Information Processing Systems, vol. 26, 2013.
- A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan, “Online optimization: Competing with dynamic comparators,” in Artificial Intelligence and Statistics. PMLR, 2015, pp. 398–406.
- T.-J. Chang and S. Shahrampour, “On online optimization: Dynamic regret analysis of strongly convex and smooth problems,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, 2021, pp. 6966–6973.
- E. Hazan, S. Kakade, and K. Singh, “The nonstochastic control problem,” in Algorithmic Learning Theory, 2020, pp. 408–421.
- P. Zhao, Y.-X. Wang, and Z.-H. Zhou, “Non-stationary online learning with memory and non-stochastic control,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 2101–2133.
- D. Baby and Y.-X. Wang, “Optimal dynamic regret in lqr control,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 879–24 892, 2022.
- Y. Luo, V. Gupta, and M. Kolar, “Dynamic regret minimization for control of non-stationary linear dynamical systems,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 6, no. 1, pp. 1–72, 2022.
- Y. Li, S. Das, and N. Li, “Online optimal control with affine constraints,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 10, 2021, pp. 8527–8537.
- T. Li, Y. Chen, B. Sun, A. Wierman, and S. H. Low, “Information aggregation for constrained online control,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 5, no. 2, pp. 1–35, 2021.