Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Simplified Temporal Consistency Reinforcement Learning (2306.09466v1)

Published 15 Jun 2023 in cs.LG and cs.RO

Abstract: Reinforcement learning is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 times faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the DeepMind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Maximum a posteriori policy optimisation. In International Conference on Learning Representations, 2018.
  2. Near optimal behavior via approximate state abstraction. In International Conference on Machine Learning, volume 48, pp.  2915–2923, 2016.
  3. State abstraction for programmable reinforcement learning agents. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  119–125, 2002.
  4. Learning dexterous in-hand manipulation. International Journal of Robotics Research, 39(1), 2020.
  5. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  6. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  7. Model Predictive Control. Springer science & business media, 2013.
  8. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, volume 119, pp.  1597–1607, 2020.
  9. Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations, 2021.
  10. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems, volume 31, 2018.
  11. Model-augmented actor-critic: Backpropagating through paths. In International Conference on Learning Representations, 2020.
  12. Abstraction and approximate decision-theoretic planning. Artificial Intelligence, 89(1-2):219–283, 1997.
  13. Pilco: A model-based and data-efficient approach to policy search. In International Conference on machine learning, 2011.
  14. Model-based value expansion for efficient model-free reinforcement learning. In International Conference on Machine Learning, 2018.
  15. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, pp. 1587–1596, 2018.
  16. Co-adaptation of algorithmic and implementational innovations in inference-based deep reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, pp.  9828–9842, 2021.
  17. Deepmdp: Learning continuous latent space models for representation learning. In International Conference on Machine Learning, pp. 2170–2179, 2019.
  18. Simplifying model-based RL: learning representations, latent-space models, and policies with one objective. In International Conference on Learning Representations, 2023.
  19. Bootstrap your own latent–a new approach to self-supervised learning. In Advances in Neural Information Processing Systems, volume 33, pp.  21271–21284, 2020.
  20. Recurrent world models facilitate policy evolution. Advances in Neural Information Processing Systems, 31, 2018.
  21. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pp. 1861–1870, 2018.
  22. Dimensionality reduction by learning an invariant mapping. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pp.  1735–1742. IEEE, 2006.
  23. Learning latent dynamics for planning from pixels. In International Conference on Machine Learning, pp. 2555–2565, 2019.
  24. Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020.
  25. Mastering atari with discrete world models. In International Conference on Learning Representations, 2021.
  26. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  27. Temporal difference learning for model predictive control. In International Conference on Machine Learning, volume 162, pp.  8387–8406, 2022.
  28. Hasselt, H. Double q-learning. In Advances in Neural Information Processing Systems, volume 23, 2010.
  29. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9729–9738, 2020.
  30. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, volume 28, 2015.
  31. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017.
  32. Accelerating representation learning with view-consistent dynamics in data-efficient reinforcement learning. arXiv preprint arXiv:2201.07016, 2022.
  33. When to trust your model: Model-based policy optimization. In Advances in Neural Information Processing Systems, volume 32, 2019.
  34. Abstraction selection in model-based reinforcement learning. In International Conference on Machine Learning, pp. 179–188, 2015.
  35. Morel: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems, volume 33, pp.  21810–21823, 2020.
  36. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
  37. Objective mismatch in model-based reinforcement learning. In Learning for Dynamics and Control (L4DC), volume 120, pp. 761–770, 2020.
  38. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pp. 5639–5650, 2020.
  39. Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5(47):eabc5986, 2020.
  40. Towards a unified theory of state abstraction for mdps. In International Symposium on Artificial Intelligence and Mathematics, 2006.
  41. Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016.
  42. Learning dynamics models for model predictive agents. arXiv preprint arXiv:2109.14311, 2021.
  43. Contrastive variational reinforcement learning for complex observations. In Conference on Robot Learning, volume 155, pp.  959–972, 2020.
  44. Dynamic abstraction in reinforcement learning via clustering. In International Conference on Machine Learning, pp.  71, 2004.
  45. Learning temporally-consistent representations for data-efficient reinforcement learning. arXiv preprint arXiv:2110.04935, 2021.
  46. Temporal predictive coding for model-based planning in latent space. In International Conference on Machine Learning, pp. 8130–8139, 2021.
  47. Value prediction network. In Advances in Neural Information Processing Systems, volume 30, 2017.
  48. Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In IEEE International Conference on Robotics and Automation (ICRA), pp.  4209–4215, 2021.
  49. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  50. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, volume 32, 2019.
  51. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning, pp. 2778–2787, 2017.
  52. Incremental multi-step q-learning. In Machine Learning Proceedings 1994, pp.  226–232. Elsevier, 1994.
  53. deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG), 37(4):1–14, 2018.
  54. The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675, 2017.
  55. Representation discovery for mdps using bisimulation metrics. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
  56. Prioritized experience replay. In International Conference on Learning Representations, 2016.
  57. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  58. Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations, 2021.
  59. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pp. 8583–8592, 2020.
  60. The predictron: End-to-end learning and planning. In International Conference on Machine Learning, pp. 3191–3199, 2017.
  61. Reinforcement learning with soft state aggregation. In Advances in Neural Information Processing Systems, volume 7, 1994.
  62. The distracting control suite–a challenging benchmark for reinforcement learning from pixels. arXiv preprint arXiv:2101.02722, 2021.
  63. Value iteration networks. In Advances in Neural Information Processing Systems, volume 29, 2016.
  64. Understanding self-supervised learning dynamics without contrastive pairs. In International Conference on Machine Learning, pp. 10268–10278, 2021.
  65. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
  66. Watkins, C. J. C. H. Learning from Delayed Rewards. PhD thesis, King’s College, 1989.
  67. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems, volume 28, 2015.
  68. Model predictive path integral control using covariance variable importance sampling. arXiv preprint arXiv:1509.01149, 2015.
  69. Daydreamer: World models for physical robot learning. In Conference on Robot Learning, volume 205, pp.  2226–2240, 2022.
  70. Representation matters: Offline pretraining for sequential decision making. In International Conference on Machine Learning, pp. 11784–11794, 2021.
  71. Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  10674–10681, 2021.
  72. Mastering visual continuous control: Improved data-augmented reinforcement learning. In International Conference on Learning Representations, 2022.
  73. Mastering atari games with limited data. In Advances in Neural Information Processing Systems, volume 34, pp.  25476–25488, 2021.
  74. Mopo: Model-based offline policy optimization. In Advances in Neural Information Processing Systems, volume 33, pp.  14129–14142, 2020.
  75. Decoupling dynamics and reward for transfer learning. In International Conference on Learning Representations, 2018.
  76. Learning invariant representations for reinforcement learning without reconstruction. In International Conference on Learning Representations, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yi Zhao (222 papers)
  2. Wenshuai Zhao (14 papers)
  3. Rinu Boney (12 papers)
  4. Juho Kannala (108 papers)
  5. Joni Pajarinen (68 papers)
Citations (9)