Learning Latent Dynamic Robust Representations for World Models (2405.06263v2)
Abstract: Visual Model-Based Reinforcement Learning (MBRL) promises to encapsulate agent's knowledge about the underlying dynamics of the environment, enabling learning a world model as a useful planner. However, top MBRL agents such as Dreamer often struggle with visual pixel-based inputs in the presence of exogenous or irrelevant noise in the observation space, due to failure to capture task-specific features while filtering out irrelevant spatio-temporal details. To tackle this problem, we apply a spatio-temporal masking strategy, a bisimulation principle, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models, effectively eliminating non-essential information. Joint training of representations, dynamics, and policy often leads to instabilities. To further address this issue, we develop a Hybrid Recurrent State-Space Model (HRSSM) structure, enhancing state representation robustness for effective policy learning. Our empirical evaluation demonstrates significant performance improvements over existing methods in a range of visually complex control tasks such as Maniskill \cite{gu2023maniskill2} with exogenous distractors from the Matterport environment. Our code is avaliable at https://github.com/bit1029public/HRSSM.
- Flambe: Structural complexity and representation learning of low rank mdps. Advances in neural information processing systems, 33:20095–20107, 2020.
- Deep reinforcement learning at the edge of the statistical precipice. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 29304–29320, 2021.
- Combating the compounding-error problem with a multi-step model. arXiv preprint arXiv:1905.13320, 2019.
- Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- Castro, P. S. Scalable methods for computing state similarity in deterministic markov decision processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 10069–10076, 2020.
- Mico: Improved representations via sampling-based state similarity for markov decision processes. Advances in Neural Information Processing Systems, 34:30113–30126, 2021.
- Matterport3d: Learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158, 2017.
- Learning compact models for planning with exogenous processes. In Conference on Robot Learning, pp. 813–822. PMLR, 2020.
- Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
- Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations. In International Conference on Machine Learning, pp. 4956–4975. PMLR, 2022.
- Provably filtering exogenous distractors using multistep inverse dynamics. In International Conference on Learning Representations, 2021.
- Sample-efficient reinforcement learning in the presence of exogenous information. In Conference on Learning Theory, pp. 5062–5127. PMLR, 2022.
- Masked autoencoders as spatiotemporal learners. CoRR, abs/2205.09113, 2022. doi: 10.48550/arXiv.2205.09113. URL https://doi.org/10.48550/arXiv.2205.09113.
- Bisimulation metrics for continuous markov decision processes. SIAM Journal on Computing, 40(6):1662–1714, 2011.
- Methods for computing state similarity in markov decision processes. arXiv preprint arXiv:1206.6836, 2012a.
- Metrics for finite markov decision processes. arXiv preprint arXiv:1207.4114, 2012b.
- Learning task informed abstractions. In International Conference on Machine Learning, pp. 3480–3491. PMLR, 2021.
- Deepmdp: Learning continuous latent space models for representation learning. In International Conference on Machine Learning, pp. 2170–2179. PMLR, 2019.
- Equivalence notions and model minimization in markov decision processes. Artif. Intell., 147(1-2):163–223, 2003.
- Maniskill2: A unified benchmark for generalizable manipulation skills. arXiv preprint arXiv:2302.04659, 2023.
- Recurrent world models facilitate policy evolution. Advances in neural information processing systems, 31, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
- Learning latent dynamics for planning from pixels. In International conference on machine learning, pp. 2555–2565. PMLR, 2019b.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Hamrick, J. B. Analogues of mental simulation and imagination in deep learning. Current Opinion in Behavioral Sciences, 29:8–16, 2019.
- Generalization in reinforcement learning by soft data augmentation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13611–13617. IEEE, 2021.
- Stabilizing deep q-learning with convnets and vision transformers under data augmentation. Advances in neural information processing systems, 34:3680–3693, 2021.
- Modem: Accelerating visual model-based reinforcement learning with demonstrations. arXiv preprint arXiv:2212.05698, 2022a.
- Temporal difference learning for model predictive control. arXiv preprint arXiv:2203.04955, 2022b.
- Td-mpc2: Scalable, robust world models for continuous control. arXiv preprint arXiv:2310.16828, 2023.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
- Deep reinforcement learning that matters. In McIlraith, S. A. and Weinberger, K. Q. (eds.), Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 3207–3214. AAAI Press, 2018. doi: 10.1609/AAAI.V32I1.11694. URL https://doi.org/10.1609/aaai.v32i1.11694.
- Agent-controller representations: Principled offline rl with rich exogenous information. arXiv preprint arXiv:2211.00164, 2022.
- Principled offline rl in the presence of rich exogenous information. 2023.
- When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32, 2019.
- gamma-models: Generative temporal difference learning for infinite-horizon prediction. Advances in Neural Information Processing Systems, 33:1724–1735, 2020.
- Uncertainty-driven imagination for continuous deep reinforcement learning. In Conference on Robot Learning, pp. 195–206. PMLR, 2017.
- The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
- Towards robust bisimulation metric learning. Advances in Neural Information Processing Systems, 34:4764–4777, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29, 2016.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
- Guaranteed discovery of control-endogenous latent states with multi-step inverse models. Transactions on Machine Learning Research, 2022.
- Bisimulation through probabilistic testing. In Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, USA, January 11-13, 1989, pp. 344–352. ACM Press, 1989.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pp. 5639–5650. PMLR, 2020.
- Masked autoencoding for scalable and generalizable decision making. Advances in Neural Information Processing Systems, 35:12608–12618, 2022.
- NM512. Dreamerv3 pytorch implementation. https://github.com/NM512/dreamerv3-torch, 2023.
- Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In 2021 ieee international conference on robotics and automation (icra), pp. 4209–4215. IEEE, 2021.
- Recore: Regularized contrastive representation learning of world model. arXiv preprint arXiv:2312.09056, 2023.
- Offline reinforcement learning from images with latent space models. In Learning for Dynamics and Control, pp. 1154–1168. PMLR, 2021.
- Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai. arXiv preprint arXiv:2109.08238, 2021.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626, 2017.
- Masked world models for visual control. In Conference on Robot Learning, pp. 1332–1344. PMLR, 2023a.
- Multi-view masked world models for visual robotic manipulation. arXiv preprint arXiv:2302.02408, 2023b.
- The distracting control suite - A challenging benchmark for reinforcement learning from pixels. CoRR, abs/2101.02722, 2021. URL https://arxiv.org/abs/2101.02722.
- Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990, pp. 216–224. Elsevier, 1990.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. IEEE, 2012.
- Ignorance is bliss: Robust control via information gating. arXiv preprint arXiv:2303.06121, 2023.
- Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems, 35:10078–10093, 2022.
- Denoised mdps: Learning world models better than the world itself. arXiv preprint arXiv:2206.15477, 2022a.
- Causal dynamics learning for task-independent state abstraction. arXiv preprint arXiv:2206.13452, 2022b.
- Masked feature prediction for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678, 2022.
- Model predictive path integral control: From theory to parallel computation. Journal of Guidance, Control, and Dynamics, 40(2):344–357, 2017.
- Learning to combat compounding-error in model-based reinforcement learning. arXiv preprint arXiv:1912.11206, 2019.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021a.
- Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 10674–10681, 2021b.
- Mask-based latent reconstruction for reinforcement learning. Advances in Neural Information Processing Systems, 35:25117–25131, 2022.
- Simsr: Simple distance-based state representations for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 8997–9005, 2022a.
- Behavior prior representation learning for offline reinforcement learning. arXiv preprint arXiv:2211.00863, 2022b.
- Natural environment benchmarks for reinforcement learning. arXiv preprint arXiv:1811.06032, 2018.
- Invariant causal prediction for block mdps. In International Conference on Machine Learning, pp. 11214–11224. PMLR, 2020a.
- Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020b.
- Simplified temporal consistency reinforcement learning. arXiv preprint arXiv:2306.09466, 2023.
- Repo: Resilient model-based reinforcement learning by regularizing posterior predictability. CoRR, abs/2309.00082, 2023. doi: 10.48550/ARXIV.2309.00082. URL https://doi.org/10.48550/arXiv.2309.00082.
- Ruixiang Sun (1 paper)
- Hongyu Zang (12 papers)
- Xin Li (980 papers)
- Riashat Islam (30 papers)