Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control? (2212.14511v2)
Abstract: We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.
- Karl J Åström. Introduction to Stochastic Control Theory. Courier Corporation, 2012.
- Dimitri Bertsekas. Dynamic Programming and Optimal Control: Volume I, volume 1. Athena Scientific, 2012.
- On the singular values of a product of operators. SIAM Journal on Matrix Analysis and Applications, 11(2):272–277, 1990.
- Convex optimization. Cambridge university press, 2004.
- Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations. arXiv preprint arXiv:2110.14565, 2021.
- Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476. PMLR, 2018.
- Extracting latent state representations with linear dynamics from rich observations. In International Conference on Machine Learning, pages 6705–6725. PMLR, 2022.
- Learning task informed abstractions. In International Conference on Machine Learning, pages 3480–3491. PMLR, 2021.
- World models. arXiv preprint arXiv:1803.10122, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
- Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019b.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Bootstrapping upper confidence bound. Advances in Neural Information Processing Systems, 32, 2019.
- Time varying regression with hidden linear dynamics. arXiv preprint arXiv:2112.14862, 2021.
- Thomas Kailath. Linear Systems, volume 156. Prentice-Hall Englewood Cliffs, NJ, 1980.
- Nicholas Komaroff. On bounds for the solution of the Riccati equation for discrete-time control systems. In Control and Dynamic Systems, volume 78, pages 275–311. Elsevier, 1996.
- Logarithmic regret bound in partially observable linear dynamical systems. Advances in Neural Information Processing Systems, 33:20876–20888, 2020.
- Adaptive control and regret minimization in linear quadratic Gaussian (LQG) setting. In 2021 American Control Conference (ACC), pages 2517–2522. IEEE, 2021.
- Guaranteed discovery of controllable latent states with multi-step inverse models. arXiv preprint arXiv:2207.08229, 2022.
- Lennart Ljung. System identification. In Signal Analysis and Prediction, pages 163–173. Springer, 1998.
- Certainty equivalence is efficient for linear quadratic control. Advances in Neural Information Processing Systems, 32, 2019.
- Learning the linear quadratic regulator from nonlinear observations. Advances in Neural Information Processing Systems, 33:14532–14543, 2020.
- Online control of unknown time-varying dynamical systems. Advances in Neural Information Processing Systems, 34, 2021.
- Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In IEEE International Conference on Robotics and Automation (ICRA), pages 4209–4215. IEEE, 2021.
- Non-asymptotic identification of LTI systems from a single trajectory. In 2019 American control conference (ACC), pages 5655–5661. IEEE, 2019.
- Kathrin Schacke. On the Kronecker product. Master’s Thesis, University of Waterloo, 2004.
- Peter Hans Schoenemann. A solution of the orthogonal Procrustes problem with applications to orthogonal and oblique rotation. University of Illinois at Urbana-Champaign, 1964.
- Peter H Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
- Mastering Atari, Go, Chess and Shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Improper learning for non-stochastic control. In Conference on Learning Theory, pages 3320–3436. PMLR, 2020.
- Approximate information state for approximate planning and reinforcement learning in partially observed systems. arXiv preprint arXiv:2010.08843, 2020.
- Model-based RL in contextual decision processes: PAC bounds and exponential improvements over model-free approaches. In Conference on Learning Theory, pages 2898–2933. PMLR, 2019.
- Reinforcement Learning: An Introduction. MIT Press, 2018.
- The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Conference on Learning Theory, pages 3036–3083. PMLR, 2019.
- Low-rank solutions of linear matrix equations via procrustes flow. In International Conference on Machine Learning, pages 964–973. PMLR, 2016.
- Globally convergent policy search for output estimation. In Advances in Neural Information Processing Systems, 2022.
- Roman Vershynin. High-dimensional Probability: An Introduction with Applications in Data Science, volume 47. Cambridge University press, 2018.
- Martin J Wainwright. High-dimensional Statistics: A Non-asymptotic Viewpoint, volume 48. Cambridge University Press, 2019.
- Denoised MDPs: Learning world models better than the world itself. arXiv preprint arXiv:2206.15477, 2022.
- Discrete approximate information states in partially observable environments. In 2022 American Control Conference (ACC), pages 1406–1413. IEEE, 2022.
- Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
- Sharper sub-weibull concentrations. Mathematics, 10(13):2252, 2022.
- Derivative-free policy optimization for linear risk-sensitive and robust control design: Implicit regularization and sample complexity. Advances in Neural Information Processing Systems, 34:2949–2964, 2021.
- Solar: Deep structured representations for model-based reinforcement learning. In International Conference on Machine Learning, pages 7444–7453. PMLR, 2019.
- Boundedness of the Kalman filter revisited. IFAC-PapersOnLine, 54(7):334–338, 2021.
- Sample complexity of linear quadratic Gaussian (LQG) control for output feedback systems. In Learning for Dynamics and Control, pages 559–570. PMLR, 2021.
- On asymptotic stability of discrete-time linear time-varying systems. IEEE Transactions on Automatic Control, 62(8):4274–4281, 2017.