Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Goal-conditioned Offline Planning from Curious Exploration (2311.16996v1)

Published 28 Nov 2023 in cs.LG and cs.AI

Abstract: Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Hindsight experience replay. Advances in Neural Information Processing Systems, 2017.
  2. Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems, 2016.
  3. Exploration by random network distillation. In International Conference on Learning Representations, 2019.
  4. H. Charlesworth and G. Montana. Plangan: Model-based planning with sparse rewards and multiple goals. In Advances in Neural Information Processing Systems, 2020.
  5. Actionable models: Unsupervised offline reinforcement learning of robotic skills. In International Conference on Machine Learning, 2021.
  6. Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning. Advances in Neural Information Processing Systems, 2022.
  7. Intrinsically motivated reinforcement learning. Advances in Neural Information Processing Systems, 2004.
  8. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in Neural Information Processing Systems, 2018.
  9. P. Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613–624, 1993.
  10. First return, then explore. Nature, 590(7847):580–586, 2021.
  11. Rvs: What is essential for offline RL via supervised learning? In International Conference on Learning Representations, 2022.
  12. Search on the replay buffer: Bridging planning and reinforcement learning. Advances in Neural Information Processing Systems, 2019.
  13. Farama-Foundation. Gymnasium-robotics. https://github.com/Farama-Foundation/Gymnasium-Robotics, 2023.
  14. D4rl: Datasets for deep data-driven reinforcement learning, 2020.
  15. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, 2018.
  16. Benchmarking offline reinforcement learning on real-robot hardware. In The Eleventh International Conference on Learning Representations, 2023.
  17. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 2018.
  18. Deep hierarchical planning from pixels. In Advances in Neural Information Processing Systems, 2022.
  19. When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems, 2019.
  20. L. P. Kaelbling. Learning to achieve goals. In IJCAI, volume 2, pages 1094–8, 1993.
  21. D. P. K. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  22. Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations, 2022.
  23. The challenges of exploration for offline reinforcement learning. arXiv preprint arXiv:2201.11861, 2022.
  24. URLB: Unsupervised reinforcement learning benchmark. In Deep RL Workshop NeurIPS 2021, 2021.
  25. Hierarchical planning through goal-conditioned offline reinforcement learning. IEEE Robotics and Automation Letters, 2022.
  26. Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016.
  27. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control. In International Conference on Learning Representations (ICLR), 2019.
  28. Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 2021.
  29. Alan: Autonomously exploring robotic agents in the real world. arXiv preprint arXiv:2302.06604, 2023.
  30. Learning goal-conditioned policies offline with self-supervised reward shaping. In 6th Annual Conference on Robot Learning, 2022.
  31. Planning with goal-conditioned policies. Advances in Neural Information Processing Systems, 32, 2019.
  32. Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2):265–286, 2007.
  33. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning, 2017.
  34. Mbrl-lib: A modular library for model-based reinforcement learning. Arxiv, 2021.
  35. Sample-efficient cross-entropy method for real-time planning. In Conference on Robot Learning, 2021.
  36. A generalist agent. Transactions on Machine Learning Research, 2022.
  37. R. Rubinstein. The cross-entropy method for combinatorial and continuous optimization. Methodology and computing in applied probability, 1(2):127–190, 1999.
  38. Curious exploration via structured world models yields zero-shot object manipulation. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Dec. 2022.
  39. Semi-parametric topological memory for navigation. In International Conference on Learning Representations, 2018.
  40. Universal value function approximators. In International Conference on Machine Learning, 2015.
  41. J. Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international Conference on simulation of adaptive behavior: From animals to animats, pages 222–227, 1991.
  42. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  43. Planning to explore via self-supervised world models. In International Conference on Machine Learning, 2020.
  44. Incentivizing exploration in reinforcement learning with deep predictive models. In International Conference on Learning Representations, 2016.
  45. Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808, 2021.
  46. Model-based visual planning with self-supervised functional distances. In International Conference on Learning Representations, 2021.
  47. A. Touati and Y. Ollivier. Learning one representation to optimize all rewards. Advances in Neural Information Processing Systems, 2021.
  48. Does zero-shot reinforcement learning exist? In 3rd Offline RL Workshop: Offline RL as a ”Launchpad”, 2022.
  49. Risk-averse zero-order trajectory optimization. In 5th Annual Conference on Robot Learning, 2021.
  50. Critic regularized regression. Advances in Neural Information Processing Systems, 2020.
  51. Rethinking goal-conditioned supervised learning and its connection to offline RL. In International Conference on Learning Representations, 2022.
  52. Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning. In ICLR 2022 Workshop on Generalizable Policy Learning in Physical World, 2022.
  53. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.