Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hieros: Hierarchical Imagination on Structured State Space Sequence World Models (2310.05167v3)

Published 8 Oct 2023 in cs.AI

Abstract: One of the biggest challenges to modern deep reinforcement learning (DRL) algorithms is sample efficiency. Many approaches learn a world model in order to train an agent entirely in imagination, eliminating the need for direct environment interaction during training. However, these methods often suffer from either a lack of imagination accuracy, exploration capabilities, or runtime efficiency. We propose Hieros, a hierarchical policy that learns time abstracted world representations and imagines trajectories at multiple time scales in latent space. Hieros uses an S5 layer-based world model, which predicts next world states in parallel during training and iteratively during environment interaction. Due to the special properties of S5 layers, our method can train in parallel and predict next world states iteratively during imagination. This allows for more efficient training than RNN-based world models and more efficient imagination than Transformer-based world models. We show that our approach outperforms the state of the art in terms of mean and median normalized human score on the Atari 100k benchmark, and that our proposed world model is able to predict complex dynamics very accurately. We also show that Hieros displays superior exploration capabilities compared to existing approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 34:29304–29320, 2021.
  2. An information-theoretic perspective on intrinsic motivation in reinforcement learning: A survey. Entropy, 25(2):327, 2023. ISSN 1099-4300. doi: 10.3390/e25020327. URL https://www.mdpi.com/1099-4300/25/2/327. Number: 2 Publisher: Multidisciplinary Digital Publishing Institute.
  3. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  4. C2D. S5: Simplified state space layers for sequence modeling, 2023. URL https://github.com/i404788/s5-pytorch. original-date: 2023-03-20T23:57:07Z.
  5. TransDreamer: Reinforcement learning with transformer world models, 2022. URL http://arxiv.org/abs/2202.09481. arXiv:2202.09481.
  6. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  7. Hungry hungry hippos: Towards language modeling with state space models. arXiv preprint arXiv:2212.14052, 2022.
  8. FN David and NL Johnson. The probability integral transformation when parameters are estimated from the sample. Biometrika, 35(1/2):182–190, 1948.
  9. Feudal reinforcement learning. Advances in Neural Information Processing Systems, 5, 1992.
  10. Facing off world model backbones: RNNs, transformers, and S4, 2023. URL http://arxiv.org/abs/2307.02064. arXiv:2307.02064.
  11. Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012, 2017.
  12. Efficiently modeling long sequences with structured state spaces, 2022. URL http://arxiv.org/abs/2111.00396. arXiv:2111.00396.
  13. Simplifying and understanding state space models with diagonal linear rnns. arXiv preprint arXiv:2212.00768, 2022.
  14. World models. arXiv preprint arXiv:1803.10122, 2018.
  15. Dream to control: Learning behaviors by latent imagination, 2020. URL http://arxiv.org/abs/1912.01603. arXiv:1912.01603.
  16. Deep hierarchical planning from pixels, 2022a. URL http://arxiv.org/abs/2206.04114. arXiv:2206.04114.
  17. Learning latent dynamics for planning from pixels. 2022b. URL http://arxiv.org/abs/2206.04114. arXiv:2206.04114.
  18. Mastering atari with discrete world models, 2022c. URL http://arxiv.org/abs/2010.02193. arXiv:2010.02193.
  19. Mastering diverse domains through world models, 2023. URL http://arxiv.org/abs/2301.04104. arXiv:2301.04104.
  20. Hierarchical reinforcement learning: A survey and open research challenges. Machine Learning and Knowledge Extraction, 4(1):172–221, 2022. ISSN 2504-4990. doi: 10.3390/make4010009. URL https://www.mdpi.com/2504-4990/4/1/9. Number: 1 Publisher: Multidisciplinary Digital Publishing Institute.
  21. Language as an abstraction for hierarchical deep reinforcement learning, 2019. URL http://arxiv.org/abs/1906.07343. arXiv:1906.07343.
  22. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
  23. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. arXiv preprint arXiv:2106.14876, 2021.
  24. Improved variational inference with inverse autoregressive flow. Advances in Neural Information Processing Systems, 29, 2016.
  25. Dream and search to control: Latent space planning for continuous control. arXiv preprint arXiv:2010.09832, 2020.
  26. Hierarchical imitation learning with vector quantized models, 2023. URL http://arxiv.org/abs/2301.12962. arXiv:2301.12962.
  27. Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62, 2022.
  28. Hierarchical planning through goal-conditioned offline reinforcement learning, 2022. URL http://arxiv.org/abs/2205.11790. arXiv:2205.11790.
  29. Structured state space models for in-context reinforcement learning, 2023. URL http://arxiv.org/abs/2303.03982. arXiv:2303.03982.
  30. Long range language modeling via gated state spaces. arXiv preprint arXiv:2206.13947, 2022.
  31. Transformers are sample-efficient world models, 2023. URL http://arxiv.org/abs/2209.00588. arXiv:2209.00588.
  32. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
  33. Data-efficient hierarchical reinforcement learning, 2018. URL http://arxiv.org/abs/1805.08296. arXiv:1805.08296.
  34. Multi-agent manipulation via locomotion using hierarchical Sim2Real, 2019a. URL http://arxiv.org/abs/1908.05224. arXiv:1908.05224.
  35. Near-optimal representation learning for hierarchical reinforcement learning, 2019b. URL http://arxiv.org/abs/1810.01257. arXiv:1810.01257.
  36. Why does hierarchy (sometimes) work so well in reinforcement learning?, 2019c. URL http://arxiv.org/abs/1909.10618. arXiv:1909.10618.
  37. Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation, 2019. URL http://arxiv.org/abs/1909.05829. arXiv:1909.05829.
  38. NM512. dreamerv3-torch, 2023. URL https://github.com/NM512/dreamerv3-torch. original-date: 2023-02-11T23:23:26Z.
  39. Dreaming: Model-based reinforcement learning by latent imagination without reconstruction, 2021. URL http://arxiv.org/abs/2007.14535. arXiv:2007.14535.
  40. Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems, 10, 1997.
  41. Contrastive unsupervised learning of world model with invariant causal features, 2022. URL http://arxiv.org/abs/2209.14932. arXiv:2209.14932.
  42. Transformer-based world models are happy with 100k interactions, 2023. URL http://arxiv.org/abs/2303.07109. arXiv:2303.07109.
  43. Latent plans for task-agnostic offline reinforcement learning. In Conference on Robot Learning, pp.  1838–1849. PMLR, 2023.
  44. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  45. Pretraining representations for data-efficient reinforcement learning, 2021. URL http://arxiv.org/abs/2106.04799. arXiv:2106.04799.
  46. Simplified state space layers for sequence modeling, 2023. URL http://arxiv.org/abs/2208.04933. arXiv:2208.04933.
  47. Reinforcement learning: An introduction. MIT press, 2018.
  48. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.
  49. A deep hierarchical approach to lifelong learning in minecraft. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  50. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
  51. Roman V Yampolskiy. Artificial intelligence safety and security. CRC Press, 2018.
  52. Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
  53. Efficient long sequence modeling via state space augmented transformer. arXiv preprint arXiv:2212.08136, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Paul Mattes (1 paper)
  2. Rainer Schlosser (2 papers)
  3. Ralf Herbrich (11 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com