Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MuDreamer: Learning Predictive World Models without Reconstruction (2405.15083v1)

Published 23 May 2024 in cs.AI and cs.CV

Abstract: The DreamerV3 agent recently demonstrated state-of-the-art performance in diverse domains, learning powerful world models in latent space using a pixel reconstruction loss. However, while the reconstruction loss is essential to Dreamer's performance, it also necessitates modeling unnecessary information. Consequently, Dreamer sometimes fails to perceive crucial elements which are necessary for task-solving when visual distractions are present in the observation, significantly limiting its potential. In this paper, we present MuDreamer, a robust reinforcement learning agent that builds upon the DreamerV3 algorithm by learning a predictive world model without the need for reconstructing input signals. Rather than relying on pixel reconstruction, hidden representations are instead learned by predicting the environment value function and previously selected actions. Similar to predictive self-supervised methods for images, we find that the use of batch normalization is crucial to prevent learning collapse. We also study the effect of KL balancing between model posterior and prior losses on convergence speed and learning stability. We evaluate MuDreamer on the commonly used DeepMind Visual Control Suite and demonstrate stronger robustness to visual distractions compared to DreamerV3 and other reconstruction-free approaches, replacing the environment background with task-irrelevant real-world videos. Our method also achieves comparable performance on the Atari100k benchmark while benefiting from faster training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Self-supervised learning from images with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023.
  2. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  3. Data2vec: A general framework for self-supervised learning in speech, vision and language. In International Conference on Machine Learning, pages 1298–1312. PMLR, 2022.
  4. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, 35:24639–24654, 2022.
  5. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.
  6. Distributional policy gradients. In International Conference on Learning Representations, 2018.
  7. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  8. Information prioritization through empowerment in visual model-based rl. In International Conference on Learning Representations, 2022.
  9. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
  10. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  11. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  12. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021.
  13. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  14. Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pages 72–83. Springer, 2006.
  15. Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations. In International Conference on Machine Learning, pages 4956–4975. PMLR, 2022.
  16. Whitening for self-supervised representation learning. In International Conference on Machine Learning, pages 3015–3024. PMLR, 2021.
  17. Masked autoencoders as spatiotemporal learners. Advances in neural information processing systems, 35:35946–35958, 2022.
  18. Improving pilco with bayesian neural network dynamics models. In Data-efficient machine learning workshop, ICML, volume 4, page 25, 2016.
  19. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  20. World models. arXiv preprint arXiv:1803.10122, 2018.
  21. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  22. Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019.
  23. Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020.
  24. Mastering atari with discrete world models. In International Conference on Learning Representations, 2021.
  25. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  26. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  27. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  28. Model-based planning with discrete and continuous actions. arXiv preprint arXiv:1705.07177, 2017.
  29. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  30. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  31. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations, 2019.
  32. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
  33. Model-based reinforcement learning for atari. In International Conference on Learning Representations, 2020.
  34. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
  35. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  36. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR, 2020.
  37. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
  38. Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016.
  39. Contrastive variational reinforcement learning for complex observations. In Conference on Robot Learning, pages 959–972. PMLR, 2021.
  40. Transformers are sample efficient world models. In International Conference on Learning Representations, 2023.
  41. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  42. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  43. Temporal predictive coding for model-based planning in latent space. In International Conference on Machine Learning, pages 8130–8139. PMLR, 2021.
  44. Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In 2021 ieee international conference on robotics and automation (icra), pages 4209–4215. IEEE, 2021.
  45. Dreamingv2: Reinforcement learning with discrete world models without reconstruction. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 985–991. IEEE, 2022.
  46. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  47. Blast: Latent dynamics models from bootstrapping. In Deep RL Workshop NeurIPS 2021, 2021.
  48. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  49. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  50. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  51. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  52. The predictron: End-to-end learning and planning. In International Conference on Machine Learning, pages 3191–3199. PMLR, 2017.
  53. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
  54. Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3:9–44, 1988.
  55. Richard S Sutton. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2(4):160–163, 1991.
  56. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
  57. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  58. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  59. Exploring model-based planning with policy networks. arXiv preprint arXiv:1906.08649, 2019.
  60. Benchmarking model-based reinforcement learning. arXiv preprint arXiv:1907.02057, 2019.
  61. Embed to control: A locally linear latent dynamics model for control from raw images. Advances in neural information processing systems, 28, 2015.
  62. Masked feature prediction for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14668–14678, 2022.
  63. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  64. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9653–9663, 2022.
  65. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
  66. Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
  67. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Maxime Burchi (7 papers)
  2. Radu Timofte (299 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.