Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control? (2212.14511v2)

Published 30 Dec 2022 in cs.LG, cs.SY, eess.SY, math.OC, and stat.ML

Abstract: We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Karl J Åström. Introduction to Stochastic Control Theory. Courier Corporation, 2012.
  2. Dimitri Bertsekas. Dynamic Programming and Optimal Control: Volume I, volume 1. Athena Scientific, 2012.
  3. On the singular values of a product of operators. SIAM Journal on Matrix Analysis and Applications, 11(2):272–277, 1990.
  4. Convex optimization. Cambridge university press, 2004.
  5. Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations. arXiv preprint arXiv:2110.14565, 2021.
  6. Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476. PMLR, 2018.
  7. Extracting latent state representations with linear dynamics from rich observations. In International Conference on Machine Learning, pages 6705–6725. PMLR, 2022.
  8. Learning task informed abstractions. In International Conference on Machine Learning, pages 3480–3491. PMLR, 2021.
  9. World models. arXiv preprint arXiv:1803.10122, 2018.
  10. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
  11. Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019b.
  12. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  13. Bootstrapping upper confidence bound. Advances in Neural Information Processing Systems, 32, 2019.
  14. Time varying regression with hidden linear dynamics. arXiv preprint arXiv:2112.14862, 2021.
  15. Thomas Kailath. Linear Systems, volume 156. Prentice-Hall Englewood Cliffs, NJ, 1980.
  16. Nicholas Komaroff. On bounds for the solution of the Riccati equation for discrete-time control systems. In Control and Dynamic Systems, volume 78, pages 275–311. Elsevier, 1996.
  17. Logarithmic regret bound in partially observable linear dynamical systems. Advances in Neural Information Processing Systems, 33:20876–20888, 2020.
  18. Adaptive control and regret minimization in linear quadratic Gaussian (LQG) setting. In 2021 American Control Conference (ACC), pages 2517–2522. IEEE, 2021.
  19. Guaranteed discovery of controllable latent states with multi-step inverse models. arXiv preprint arXiv:2207.08229, 2022.
  20. Lennart Ljung. System identification. In Signal Analysis and Prediction, pages 163–173. Springer, 1998.
  21. Certainty equivalence is efficient for linear quadratic control. Advances in Neural Information Processing Systems, 32, 2019.
  22. Learning the linear quadratic regulator from nonlinear observations. Advances in Neural Information Processing Systems, 33:14532–14543, 2020.
  23. Online control of unknown time-varying dynamical systems. Advances in Neural Information Processing Systems, 34, 2021.
  24. Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In IEEE International Conference on Robotics and Automation (ICRA), pages 4209–4215. IEEE, 2021.
  25. Non-asymptotic identification of LTI systems from a single trajectory. In 2019 American control conference (ACC), pages 5655–5661. IEEE, 2019.
  26. Kathrin Schacke. On the Kronecker product. Master’s Thesis, University of Waterloo, 2004.
  27. Peter Hans Schoenemann. A solution of the orthogonal Procrustes problem with applications to orthogonal and oblique rotation. University of Illinois at Urbana-Champaign, 1964.
  28. Peter H Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
  29. Mastering Atari, Go, Chess and Shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  30. Improper learning for non-stochastic control. In Conference on Learning Theory, pages 3320–3436. PMLR, 2020.
  31. Approximate information state for approximate planning and reinforcement learning in partially observed systems. arXiv preprint arXiv:2010.08843, 2020.
  32. Model-based RL in contextual decision processes: PAC bounds and exponential improvements over model-free approaches. In Conference on Learning Theory, pages 2898–2933. PMLR, 2019.
  33. Reinforcement Learning: An Introduction. MIT Press, 2018.
  34. The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Conference on Learning Theory, pages 3036–3083. PMLR, 2019.
  35. Low-rank solutions of linear matrix equations via procrustes flow. In International Conference on Machine Learning, pages 964–973. PMLR, 2016.
  36. Globally convergent policy search for output estimation. In Advances in Neural Information Processing Systems, 2022.
  37. Roman Vershynin. High-dimensional Probability: An Introduction with Applications in Data Science, volume 47. Cambridge University press, 2018.
  38. Martin J Wainwright. High-dimensional Statistics: A Non-asymptotic Viewpoint, volume 48. Cambridge University Press, 2019.
  39. Denoised MDPs: Learning world models better than the world itself. arXiv preprint arXiv:2206.15477, 2022.
  40. Discrete approximate information states in partially observable environments. In 2022 American Control Conference (ACC), pages 1406–1413. IEEE, 2022.
  41. Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
  42. Sharper sub-weibull concentrations. Mathematics, 10(13):2252, 2022.
  43. Derivative-free policy optimization for linear risk-sensitive and robust control design: Implicit regularization and sample complexity. Advances in Neural Information Processing Systems, 34:2949–2964, 2021.
  44. Solar: Deep structured representations for model-based reinforcement learning. In International Conference on Machine Learning, pages 7444–7453. PMLR, 2019.
  45. Boundedness of the Kalman filter revisited. IFAC-PapersOnLine, 54(7):334–338, 2021.
  46. Sample complexity of linear quadratic Gaussian (LQG) control for output feedback systems. In Learning for Dynamics and Control, pages 559–570. PMLR, 2021.
  47. On asymptotic stability of discrete-time linear time-varying systems. IEEE Transactions on Automatic Control, 62(8):4274–4281, 2017.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com