Simplified Temporal Consistency Reinforcement Learning (2306.09466v1)
Abstract: Reinforcement learning is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 times faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the DeepMind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
- Maximum a posteriori policy optimisation. In International Conference on Learning Representations, 2018.
- Near optimal behavior via approximate state abstraction. In International Conference on Machine Learning, volume 48, pp. 2915–2923, 2016.
- State abstraction for programmable reinforcement learning agents. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 119–125, 2002.
- Learning dexterous in-hand manipulation. International Journal of Robotics Research, 39(1), 2020.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Model Predictive Control. Springer science & business media, 2013.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, volume 119, pp. 1597–1607, 2020.
- Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations, 2021.
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems, volume 31, 2018.
- Model-augmented actor-critic: Backpropagating through paths. In International Conference on Learning Representations, 2020.
- Abstraction and approximate decision-theoretic planning. Artificial Intelligence, 89(1-2):219–283, 1997.
- Pilco: A model-based and data-efficient approach to policy search. In International Conference on machine learning, 2011.
- Model-based value expansion for efficient model-free reinforcement learning. In International Conference on Machine Learning, 2018.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, pp. 1587–1596, 2018.
- Co-adaptation of algorithmic and implementational innovations in inference-based deep reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, pp. 9828–9842, 2021.
- Deepmdp: Learning continuous latent space models for representation learning. In International Conference on Machine Learning, pp. 2170–2179, 2019.
- Simplifying model-based RL: learning representations, latent-space models, and policies with one objective. In International Conference on Learning Representations, 2023.
- Bootstrap your own latent–a new approach to self-supervised learning. In Advances in Neural Information Processing Systems, volume 33, pp. 21271–21284, 2020.
- Recurrent world models facilitate policy evolution. Advances in Neural Information Processing Systems, 31, 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pp. 1861–1870, 2018.
- Dimensionality reduction by learning an invariant mapping. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pp. 1735–1742. IEEE, 2006.
- Learning latent dynamics for planning from pixels. In International Conference on Machine Learning, pp. 2555–2565, 2019.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020.
- Mastering atari with discrete world models. In International Conference on Learning Representations, 2021.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Temporal difference learning for model predictive control. In International Conference on Machine Learning, volume 162, pp. 8387–8406, 2022.
- Hasselt, H. Double q-learning. In Advances in Neural Information Processing Systems, volume 23, 2010.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738, 2020.
- Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, volume 28, 2015.
- beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017.
- Accelerating representation learning with view-consistent dynamics in data-efficient reinforcement learning. arXiv preprint arXiv:2201.07016, 2022.
- When to trust your model: Model-based policy optimization. In Advances in Neural Information Processing Systems, volume 32, 2019.
- Abstraction selection in model-based reinforcement learning. In International Conference on Machine Learning, pp. 179–188, 2015.
- Morel: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems, volume 33, pp. 21810–21823, 2020.
- Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
- Objective mismatch in model-based reinforcement learning. In Learning for Dynamics and Control (L4DC), volume 120, pp. 761–770, 2020.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pp. 5639–5650, 2020.
- Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5(47):eabc5986, 2020.
- Towards a unified theory of state abstraction for mdps. In International Symposium on Artificial Intelligence and Mathematics, 2006.
- Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016.
- Learning dynamics models for model predictive agents. arXiv preprint arXiv:2109.14311, 2021.
- Contrastive variational reinforcement learning for complex observations. In Conference on Robot Learning, volume 155, pp. 959–972, 2020.
- Dynamic abstraction in reinforcement learning via clustering. In International Conference on Machine Learning, pp. 71, 2004.
- Learning temporally-consistent representations for data-efficient reinforcement learning. arXiv preprint arXiv:2110.04935, 2021.
- Temporal predictive coding for model-based planning in latent space. In International Conference on Machine Learning, pp. 8130–8139, 2021.
- Value prediction network. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In IEEE International Conference on Robotics and Automation (ICRA), pp. 4209–4215, 2021.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, volume 32, 2019.
- Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning, pp. 2778–2787, 2017.
- Incremental multi-step q-learning. In Machine Learning Proceedings 1994, pp. 226–232. Elsevier, 1994.
- deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG), 37(4):1–14, 2018.
- The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675, 2017.
- Representation discovery for mdps using bisimulation metrics. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
- Prioritized experience replay. In International Conference on Learning Representations, 2016.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations, 2021.
- Planning to explore via self-supervised world models. In International Conference on Machine Learning, pp. 8583–8592, 2020.
- The predictron: End-to-end learning and planning. In International Conference on Machine Learning, pp. 3191–3199, 2017.
- Reinforcement learning with soft state aggregation. In Advances in Neural Information Processing Systems, volume 7, 1994.
- The distracting control suite–a challenging benchmark for reinforcement learning from pixels. arXiv preprint arXiv:2101.02722, 2021.
- Value iteration networks. In Advances in Neural Information Processing Systems, volume 29, 2016.
- Understanding self-supervised learning dynamics without contrastive pairs. In International Conference on Machine Learning, pp. 10268–10278, 2021.
- dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
- Watkins, C. J. C. H. Learning from Delayed Rewards. PhD thesis, King’s College, 1989.
- Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems, volume 28, 2015.
- Model predictive path integral control using covariance variable importance sampling. arXiv preprint arXiv:1509.01149, 2015.
- Daydreamer: World models for physical robot learning. In Conference on Robot Learning, volume 205, pp. 2226–2240, 2022.
- Representation matters: Offline pretraining for sequential decision making. In International Conference on Machine Learning, pp. 11784–11794, 2021.
- Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10674–10681, 2021.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. In International Conference on Learning Representations, 2022.
- Mastering atari games with limited data. In Advances in Neural Information Processing Systems, volume 34, pp. 25476–25488, 2021.
- Mopo: Model-based offline policy optimization. In Advances in Neural Information Processing Systems, volume 33, pp. 14129–14142, 2020.
- Decoupling dynamics and reward for transfer learning. In International Conference on Learning Representations, 2018.
- Learning invariant representations for reinforcement learning without reconstruction. In International Conference on Learning Representations, 2021.
- Yi Zhao (222 papers)
- Wenshuai Zhao (14 papers)
- Rinu Boney (12 papers)
- Juho Kannala (108 papers)
- Joni Pajarinen (68 papers)