Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 96 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 24 tok/s
GPT-5 High 36 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 434 tok/s Pro
Kimi K2 198 tok/s Pro
2000 character limit reached

Do Agents Dream of Electric Sheep?: Improving Generalization in Reinforcement Learning through Generative Learning (2403.07979v1)

Published 12 Mar 2024 in cs.LG and cs.AI

Abstract: The Overfitted Brain hypothesis suggests dreams happen to allow generalization in the human brain. Here, we ask if the same is true for reinforcement learning agents as well. Given limited experience in a real environment, we use imagination-based reinforcement learning to train a policy on dream-like episodes, where non-imaginative, predicted trajectories are modified through generative augmentations. Experiments on four ProcGen environments show that, compared to classic imagination and offline training on collected experience, our method can reach a higher level of generalization when dealing with sparsely rewarded environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Contrastive behavioral similarity embeddings for generalization in reinforcement learning, 2021. arXiv:2101.05265 [cs.LG].
  2. Single-neuron activity and eye movements during human REM sleep and awake vision. Nature Communications, 6(1):7884, 2015.
  3. Robust locally-linear controllable embedding. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS’18), 2018.
  4. Learning and querying fast generative models for reinforcement learning, 2018. arXiv:1802.03006 [cs.LG].
  5. TransDreamer: Reinforcement learning with transformer world models, 2022. arXiv:2202.09481 [cs.LG].
  6. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), 2014.
  7. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems (NIPS’18), 2018.
  8. Quantifying generalization in reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning (ICML’19), 2019.
  9. Leveraging procedural generation to benchmark reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), 2020.
  10. Generalization and regularization in dqn, 2018. arXiv:1810.00123 [cs.LG].
  11. Improving PILCO with Bayesian neural network dynamics models. In ICML’16 Workshop in Data-Efficient Machine Learning, 2016.
  12. Why generalization in RL is difficult: Epistemic POMDPs and implicit partial observability. In Advances in Neural Information Processing Systems (NeurIPS’21), 2021.
  13. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems (NIPS’18), 2018.
  14. Learning latent dynamics for planning from pixels. In Proceedings of the 36th International Conference on Machine Learning (ICML’19), 2019.
  15. Dream to Control: Learning Behaviors by Latent Imagination. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20), 2020.
  16. Mastering Atari with Discrete World Models. In Proceedings of the 9th International Conference on Learning Representations (ICLR’21), 2021.
  17. Mastering Diverse Domains through World Models, 2023. arXiv:2301.04104 [cs.AI].
  18. Model-based planning with discrete and continuous actions, 2017. arXiv:1705.07177 [cs.AI].
  19. Erik Hoel. Enter the supersensorium: The neuroscientific case for art in the age of netflix. The Baffler, 45, 2019. https://thebaffler.com/salvos/enter-the-supersensorium-hoel.
  20. Erik Hoel. The overfitted brain: Dreams evolved to assist generalization. Patterns, 2(5):100244, 2021.
  21. Deep variational reinforcement learning for POMDPs. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), 2018.
  22. Generalization in reinforcement learning with selective noise injection and information bottleneck. In Advances in Neural Information Processing Systems (NeurIPS’19), 2019.
  23. Reinforcement learning with unsupervised auxiliary tasks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17), 2017.
  24. Prioritized level replay. In Proceedings of the 38th International Conference on Machine Learning (ICML’21), 2021.
  25. Model based reinforcement learning for atari. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20), 2020.
  26. Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems (NIPS’16), 2016.
  27. A survey of zero-shot generalisation in deep reinforcement learning. Journal of Artificial Intelligence Research, 76:64, 2023.
  28. Reinforcement learning with augmented data. In Advances in Neural Information Processing Systems (NeurIPS’20), 2020.
  29. Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. In Advances in Neural Information Processing Systems (NeurIPS’20), 2020a.
  30. Network randomization: A simple technique for generalization in deep reinforcement learning. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20), 2020b.
  31. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), 2016.
  32. Learning dynamics and generalization in deep reinforcement learning. In Proceedings of the 39th International Conference on Machine Learning (ICML’22), 2022.
  33. Transformers are sample-efficient world models. In Proceedings of the 11th International Conference on Learning Representations (ICLR’23), 2023.
  34. Mixed precision training. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18), 2018.
  35. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  36. Inceptionism: Going deeper into neural networks, 2015. Google Research Blog.
  37. Model-based reinforcement learning via imagination with derived memory. In Advances in Neural Information Processing Systems (NeurIPS’21), 2021.
  38. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA’18), 2018.
  39. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems (NeurIPS’22), 2022.
  40. Generalized proximal policy optimization with sample reuse. In Advances in Neural Information Processing Systems (NeurIPS’21), 2021.
  41. Imagination-augmented agents for deep reinforcement learning. In Advances in Neural Information Processing Systems (NIPS’17), 2017.
  42. Decoupling value and policy for generalization in reinforcement learning. In Proceedings of the 38th International Conference on Machine Learning (ICML’21), 2021.
  43. Automatic data augmentation for generalization in reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS’21), 2021.
  44. Antti Revonsuo. The reinterpretation of dreams: An evolutionary hypothesis of the function of dreaming. Behavioral and Brain Sciences, 23(6):877–901, 2000.
  45. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588:604–609, 2020.
  46. High-dimensional continuous control using generalized advantage estimation. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), 2016.
  47. Proximal policy optimization algorithms, 2017. arXiv:1707.06347 [cs.LG].
  48. Planning to explore via self-supervised world models. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), 2020.
  49. Deep inside convolutional networks: Visualising image classification models and saliency maps. In Proceedings of the ICLR’14 Workshop, 2014.
  50. Decoupling representation learning from reinforcement learning. In Proceedings of the 38th International Conference on Machine Learning (ICML’21), 2021.
  51. Reinforcement Learning: An Introduction. The MIT Press, 2017.
  52. Richard S. Sutton. Dyna, an integrated architecture for learning, planning, and reacting. In Working Notes of the 1991 AAAI Spring Symposium, 1991.
  53. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’17), 2017.
  54. Human learning in atari. In Proceedings of the 2017 AAAI Spring Symposium Series, Science of Intelligence: Computational Principles of Natural and Artificial Intelligence, 2017.
  55. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS’17), 2017.
  56. Improving generalization in reinforcement learning with mixture regularization. In Advances in Neural Information Processing Systems (NeurIPS’20), 2020.
  57. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems (NIPS’15), 2015.
  58. J Beau W Webber. A bi-symmetric log transformation for wide-range data. Measurement Science and Technology, 24(2):027001, 2012.
  59. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
  60. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In Proceedings of the 9th International Conference on Learning Representations (ICLR’21), 2021.
  61. Rotation, translation, and cropping for zero-shot generalization, 2020. arXiv:2001.09908 [cs.LG].
  62. SOLAR: deep structured representations for model-based reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning (ICML’19), 2019.
  63. Bridging imagination and reality for model-based deep reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS’20), 2020.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com