Goal-conditioned Offline Planning from Curious Exploration (2311.16996v1)
Abstract: Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments.
- Hindsight experience replay. Advances in Neural Information Processing Systems, 2017.
- Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems, 2016.
- Exploration by random network distillation. In International Conference on Learning Representations, 2019.
- H. Charlesworth and G. Montana. Plangan: Model-based planning with sparse rewards and multiple goals. In Advances in Neural Information Processing Systems, 2020.
- Actionable models: Unsupervised offline reinforcement learning of robotic skills. In International Conference on Machine Learning, 2021.
- Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning. Advances in Neural Information Processing Systems, 2022.
- Intrinsically motivated reinforcement learning. Advances in Neural Information Processing Systems, 2004.
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in Neural Information Processing Systems, 2018.
- P. Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613–624, 1993.
- First return, then explore. Nature, 590(7847):580–586, 2021.
- Rvs: What is essential for offline RL via supervised learning? In International Conference on Learning Representations, 2022.
- Search on the replay buffer: Bridging planning and reinforcement learning. Advances in Neural Information Processing Systems, 2019.
- Farama-Foundation. Gymnasium-robotics. https://github.com/Farama-Foundation/Gymnasium-Robotics, 2023.
- D4rl: Datasets for deep data-driven reinforcement learning, 2020.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, 2018.
- Benchmarking offline reinforcement learning on real-robot hardware. In The Eleventh International Conference on Learning Representations, 2023.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 2018.
- Deep hierarchical planning from pixels. In Advances in Neural Information Processing Systems, 2022.
- When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems, 2019.
- L. P. Kaelbling. Learning to achieve goals. In IJCAI, volume 2, pages 1094–8, 1993.
- D. P. K. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations, 2022.
- The challenges of exploration for offline reinforcement learning. arXiv preprint arXiv:2201.11861, 2022.
- URLB: Unsupervised reinforcement learning benchmark. In Deep RL Workshop NeurIPS 2021, 2021.
- Hierarchical planning through goal-conditioned offline reinforcement learning. IEEE Robotics and Automation Letters, 2022.
- Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016.
- Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control. In International Conference on Learning Representations (ICLR), 2019.
- Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 2021.
- Alan: Autonomously exploring robotic agents in the real world. arXiv preprint arXiv:2302.06604, 2023.
- Learning goal-conditioned policies offline with self-supervised reward shaping. In 6th Annual Conference on Robot Learning, 2022.
- Planning with goal-conditioned policies. Advances in Neural Information Processing Systems, 32, 2019.
- Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2):265–286, 2007.
- Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning, 2017.
- Mbrl-lib: A modular library for model-based reinforcement learning. Arxiv, 2021.
- Sample-efficient cross-entropy method for real-time planning. In Conference on Robot Learning, 2021.
- A generalist agent. Transactions on Machine Learning Research, 2022.
- R. Rubinstein. The cross-entropy method for combinatorial and continuous optimization. Methodology and computing in applied probability, 1(2):127–190, 1999.
- Curious exploration via structured world models yields zero-shot object manipulation. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Dec. 2022.
- Semi-parametric topological memory for navigation. In International Conference on Learning Representations, 2018.
- Universal value function approximators. In International Conference on Machine Learning, 2015.
- J. Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international Conference on simulation of adaptive behavior: From animals to animats, pages 222–227, 1991.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Planning to explore via self-supervised world models. In International Conference on Machine Learning, 2020.
- Incentivizing exploration in reinforcement learning with deep predictive models. In International Conference on Learning Representations, 2016.
- Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808, 2021.
- Model-based visual planning with self-supervised functional distances. In International Conference on Learning Representations, 2021.
- A. Touati and Y. Ollivier. Learning one representation to optimize all rewards. Advances in Neural Information Processing Systems, 2021.
- Does zero-shot reinforcement learning exist? In 3rd Offline RL Workshop: Offline RL as a ”Launchpad”, 2022.
- Risk-averse zero-order trajectory optimization. In 5th Annual Conference on Robot Learning, 2021.
- Critic regularized regression. Advances in Neural Information Processing Systems, 2020.
- Rethinking goal-conditioned supervised learning and its connection to offline RL. In International Conference on Learning Representations, 2022.
- Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning. In ICLR 2022 Workshop on Generalizable Policy Learning in Physical World, 2022.
- Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 2020.