Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming (2402.18866v2)

Published 29 Feb 2024 in cs.LG

Abstract: Model-based reinforcement learning (MBRL) has been a primary approach to ameliorating the sample efficiency issue as well as to make a generalist agent. However, there has not been much effort toward enhancing the strategy of dreaming itself. Therefore, it is a question whether and how an agent can "dream better" in a more structured and strategic way. In this paper, inspired by the observation from cognitive science suggesting that humans use a spatial divide-and-conquer strategy in planning, we propose a new MBRL agent, called Dr. Strategy, which is equipped with a novel Dreaming Strategy. The proposed agent realizes a version of divide-and-conquer-like strategy in dreaming. This is achieved by learning a set of latent landmarks and then utilizing these to learn a landmark-conditioned highway policy. With the highway policy, the agent can first learn in the dream to move to a landmark, and from there it tackles the exploration and achievement task in a more focused way. In experiments, we show that the proposed model outperforms prior pixel-based MBRL methods in various visually complex and partially observable navigation tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
  2. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
  3. Exploration by random network distillation. In International Conference on Learning Representations, 2019.
  4. Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning, pp.  1317–1327. PMLR, 2020.
  5. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  6. Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive psychology, 36(1):28–71, 1998.
  7. First return, then explore. Nature, 590(7847):580–586, 2021.
  8. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2019a.
  9. Search on the replay buffer: Bridging planning and rl. Advances in Neural Information Processing Systems, 2019b.
  10. Minerl: A large-scale dataset of minecraft demonstrations. arXiv preprint arXiv:1907.13440, 2019.
  11. Recurrent world models facilitate policy evolution. Advances in neural information processing systems, 31, 2018.
  12. Learning latent dynamics for planning from pixels. In International conference on machine learning, pp.  2555–2565. PMLR, 2019.
  13. Mastering atari with discrete world models. In International Conference on Learning Representations, 2020.
  14. Deep hierarchical planning from pixels. Advances in Neural Information Processing Systems, 35:26091–26104, 2022.
  15. Planning goals for exploration. In International Conference on Learning Representations, 2023.
  16. Learning to discover skills through guidance. In Advances in Neural Information Processing Systems, 2023.
  17. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  18. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  19. Revisiting k-means: New algorithms via bayesian nonparametrics. arXiv preprint arXiv:1111.0352, 2011.
  20. Curiosity-driven exploration via latent bayesian surprise. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  7752–7760, 2022a.
  21. Choreographer: Learning and adapting skills in imagination. In International Conference on Learning Representations, 2022b.
  22. Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 34:24379–24391, 2021.
  23. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  24. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp.  1928–1937. PMLR, 2016.
  25. Lipschitz-constrained unsupervised skill discovery. In International Conference on Learning Representations, 2022.
  26. METRA: Scalable unsupervised RL with metric-aware abstraction. In International Conference on Learning Representations, 2024.
  27. Evaluating long-term memory in 3d mazes. arXiv preprint arXiv:2210.13383, 2022.
  28. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp.  2778–2787. PMLR, 2017.
  29. Self-supervised exploration via disagreement. In International conference on machine learning, pp.  5062–5071. PMLR, 2019.
  30. Long-horizon visual planning with goal-conditioned hierarchical predictors. Advances in Neural Information Processing Systems, 33:17321–17333, 2020.
  31. Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In International Conference on Machine Learning, pp.  7750–7761. PMLR, 2020.
  32. Skew-fit: State-covering self-supervised reinforcement learning. In International Conference on Machine Learning, pp.  7783–7792. PMLR, 2020.
  33. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pp.  1278–1286. PMLR, 2014.
  34. Unlocking the power of representations in long-term novelty-based exploration. arXiv preprint arXiv:2305.01521, 2023.
  35. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pp.  8583–8592. PMLR, 2020.
  36. Nearest neighbor estimates of entropy. American journal of mathematical and management sciences, 23(3-4):301–321, 2003.
  37. Sutton, R. S. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2(4):160–163, 1991.
  38. Reinforcement learning: An introduction. MIT press, 2018.
  39. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  40. Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog, 2:20, 2019.
  41. Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  42. Reinforcement learning with prototypical representations. In International Conference on Machine Learning, pp.  11920–11931. PMLR, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com