Efficient Dynamics Modeling in Interactive Environments with Koopman Theory (2306.11941v4)
Abstract: The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could advance Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results.
- Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
- Unitary evolution recurrent neural networks. In International conference on machine learning, pp. 1120–1128. PMLR, 2016.
- Recurrent kalman networks: Factorized inference in high-dimensional deep feature spaces. In International conference on machine learning, pp. 544–552. PMLR, 2019.
- E Oran Brigham. The fast Fourier transform and its applications. Prentice-Hall, Inc., 1988.
- Modeling and control of soft robots using the koopman operator and model predictive control. arXiv preprint arXiv:1902.02827, 2019.
- Advantages of bilinear koopman realizations for the modeling and control of systems with unknown dynamics. IEEE Robotics and Automation Letters, 6(3):4369–4376, 2021.
- Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15):3932–3937, 2016.
- Modern koopman theory for dynamical systems. arXiv preprint arXiv:2102.12086, 2021.
- Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences, 116(45):22445–22451, 2019.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pp. 72–83. Springer, 2006.
- Task-agnostic dynamics priors for deep reinforcement learning. In International Conference on Machine Learning, pp. 1696–1705. PMLR, 2019.
- Learning stable koopman embeddings. In 2022 American Control Conference (ACC), pp. 2742–2747. IEEE, 2022.
- A disentangled recognition and nonlinear dynamics model for unsupervised learning. Advances in neural information processing systems, 30, 2017.
- D4rl: Datasets for deep data-driven reinforcement learning, 2020.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33:1474–1487, 2020.
- On the parameterization and initialization of diagonal state space models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022a. URL https://openreview.net/forum?id=yJE7iQSAep.
- Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations, 2022b. URL https://openreview.net/forum?id=uYLFoz1vlAC.
- Diagonal state spaces are as effective as structured state spaces. Advances in Neural Information Processing Systems, 35:22982–22994, 2022.
- Backprop kf: Learning discriminative deterministic state estimators. Advances in neural information processing systems, 29, 2016.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 1861–1870. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/haarnoja18b.html.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1lOTC4tDS.
- Deep learning of koopman representation for control. In 2020 59th IEEE Conference on Decision and Control (CDC), pp. 1890–1895. IEEE Press, 2020.
- Temporal difference learning for model predictive control. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 8387–8406. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/hansen22a.html.
- Transformer quality in linear time. In International Conference on Machine Learning, pp. 9099–9117. PMLR, 2022.
- Arieh Iserles. A first course in the numerical analysis of differential equations. Number 44. Cambridge university press, 2009.
- Reinforcement learning with unsupervised auxiliary tasks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJ6yPD5xg.
- Learning robust dynamics through variational sparse gating. Advances in Neural Information Processing Systems, 35:1612–1626, 2022.
- When to trust your model: Model-based policy optimization, 2020. URL https://openreview.net/forum?id=SkgPIpcGar. Submitted to NeurIPS 2019 Reproducibility Challenge.
- Data-driven discovery of koopman eigenfunctions for control. Machine Learning: Science and Technology, 2(3):035023, 2021.
- Bernard O Koopman. Hamiltonian systems and transformation in hilbert space. Proceedings of the National Academy of Sciences, 17(5):315–318, 1931.
- Dynamical systems of continuous spectra. Proceedings of the National Academy of Sciences, 18(3):255–263, 1932.
- Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93:149–160, 2018. ISSN 0005-1098. doi: https://doi.org/10.1016/j.automatica.2018.03.046. URL https://www.sciencedirect.com/science/article/pii/S000510981830133X.
- Continuous control with deep reinforcement learning. In Yoshua Bengio and Yann LeCun (eds.), ICLR, 2016.
- Plan online, learn offline: Efficient learning and exploration via model-based control. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Byey7n05FQ.
- Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications, 9(1):1–10, 2018.
- Introduction to the koopman operator in dynamical systems and control theory. The koopman operator in systems and control: concepts, methodologies, and applications, pp. 3–33, 2020.
- Long range language modeling via gated state spaces. arXiv preprint arXiv:2206.13947, 2022.
- Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41:309–325, 2005.
- Playing atari with deep reinforcement learning. CoRR, 2013.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Asynchronous methods for deep reinforcement learning. ICML, 2016.
- Eqr: Equivariant representations for data-efficient reinforcement learning. In International Conference on Machine Learning, pp. 15908–15926. PMLR, 2022.
- Variational inference mpc for bayesian model-based reinforcement learning. In Leslie Pack Kaelbling, Danica Kragic, and Komei Sugiura (eds.), Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, pp. 258–272. PMLR, 30 Oct–01 Nov 2020. URL https://proceedings.mlr.press/v100/okada20a.html.
- The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, volume 133. Springer, 2004.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Proximal policy optimization algorithms. ArXiv, abs/1707.06347, 2017.
- Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations, 2020.
- Action-conditional recurrent kalman networks for forward and inverse dynamics learning. In Conference on Robot Learning, pp. 765–781. PMLR, 2021.
- Haojie Shi and Max Q.-H. Meng. Deep koopman operator with control for nonlinear systems. IEEE Robotics and Automation Letters, 2022.
- Learning off-policy with online planning. In 5th Annual Conference on Robot Learning, 2021. URL https://openreview.net/forum?id=1GNV9SW95eJ.
- Mastering the game of go with deep neural networks and tree search. Nature, 550:354–359, 2017.
- A data-efficient reinforcement learning method based on local koopman operators. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 2021.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, 2020.
- Ilya Sutskever. Training recurrent neural networks. University of Toronto Toronto, ON, Canada, 2013.
- Reinforcement learning: An introduction. MIT press, 2018.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
- dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020. ISSN 2665-9638. doi: https://doi.org/10.1016/j.simpa.2020.100022. URL https://www.sciencedirect.com/science/article/pii/S2665963820300099.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Robust approximation of the stochastic koopman operator. SIAM Journal on Applied Dynamical Systems, 21(3):1930–1951, 2022.
- Embed to control: A locally linear latent dynamics model for control from raw images. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems, 2015.
- Koopman q-learning: Offline reinforcement learning via symmetries of dynamics. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 23645–23667. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/weissenbacher22a.html.
- Model predictive path integral control using covariance variable importance sampling, 2015. URL https://arxiv.org/abs/1509.01149.
- Information-theoretic model predictive control: Theory and applications to autonomous driving. IEEE Transactions on Robotics, 34(6):1603–1622, 2018.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=GY6-6sTvGaf.
- On the equivalence of contraction and koopman approaches for nonlinear stability and control. IEEE Transactions on Automatic Control, 2023.
- Arnab Kumar Mondal (23 papers)
- Siba Smarak Panigrahi (6 papers)
- Sai Rajeswar (27 papers)
- Kaleem Siddiqi (15 papers)
- Siamak Ravanbakhsh (52 papers)