Foundation Policies with Hilbert Representations (2402.15567v2)
Abstract: Unsupervised and self-supervised objectives, such as next token prediction, have enabled pre-training generalist models from large amounts of unlabeled data. In reinforcement learning (RL), however, finding a truly general and scalable unsupervised pre-training objective for generalist policies from offline data remains a major open question. While a number of methods have been proposed to enable generic self-supervised RL, based on principles such as goal-conditioned RL, behavioral cloning, and unsupervised skill learning, such methods remain limited in terms of either the diversity of the discovered behaviors, the need for high-quality demonstration data, or the lack of a clear adaptation mechanism for downstream tasks. In this work, we propose a novel unsupervised framework to pre-train generalist policies that capture diverse, optimal, long-horizon behaviors from unlabeled offline data such that they can be quickly adapted to any arbitrary new tasks in a zero-shot manner. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment, and then to span this learned latent space with directional movements, which enables various zero-shot policy "prompting" schemes for downstream tasks. Through our experiments on simulated robotic locomotion and manipulation benchmarks, we show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion, even often outperforming prior methods designed specifically for each setting. Our code and videos are available at https://seohong.me/projects/hilp/.
- Deep reinforcement learning at the edge of the statistical precipice. In Neural Information Processing Systems (NeurIPS), 2021.
- Opal: Offline primitive discovery for accelerating offline reinforcement learning. In International Conference on Learning Representations (ICLR), 2021.
- Hindsight experience replay. In Neural Information Processing Systems (NeurIPS), 2017.
- Layer normalization. ArXiv, abs/1607.06450, 2016.
- Successor features for transfer in reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2017.
- Universal successor features approximators. In International Conference on Learning Representations (ICLR), 2019.
- Inverse dynamics pretraining learns good representations for multitask imitation. In Neural Information Processing Systems (NeurIPS), 2023.
- Language models are few-shot learners. In Neural Information Processing Systems (NeurIPS), 2020.
- Exploration by random network distillation. In International Conference on Learning Representations (ICLR), 2019.
- Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning (ICML), 2021.
- Actionable models: Unsupervised offline reinforcement learning of robotic skills. In International Conference on Machine Learning (ICML), 2021.
- Self-supervised reinforcement learning that transfers using random features. In Neural Information Processing Systems (NeurIPS), 2023.
- Decision transformer: Reinforcement learning via sequence modeling. In Neural Information Processing Systems (NeurIPS), 2021a.
- Evaluating large language models trained on code. ArXiv, abs/2107.03374, 2021b.
- Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
- Dayan, P. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5:613–624, 1993.
- Goal-conditioned imitation learning. In Neural Information Processing Systems (NeurIPS), 2019.
- Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations (ICLR), 2019.
- Contrastive learning as goal-conditioned reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2022.
- Proto-value networks: Scaling representation learning with auxiliary tasks. In International Conference on Learning Representations (ICLR), 2023.
- D4rl: Datasets for deep data-driven reinforcement learning. ArXiv, abs/2004.07219, 2020.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning (ICML), 2018.
- Geng, X. Jaxcql: a simple implementation of sac and cql in jax, 2022. URL https://github.com/young-geng/JaxCQL.
- Learning to reach goals via iterated supervised learning. In International Conference on Learning Representations (ICLR), 2021.
- Reinforcement learning from passive data via latent intentions. In International Conference on Machine Learning (ICML), 2023.
- Variational intrinsic control. ArXiv, abs/1611.07507, 2016.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Conference on Robot Learning (CoRL), 2019.
- Unsupervised behavior extraction via random intent priors. In Neural Information Processing Systems (NeurIPS), 2023.
- Low-distortion embeddings of finite metric spaces. In Handbook of discrete and computational geometry, pp. 211–231. Chapman and Hall/CRC, 2017.
- Reinforcement learning as one big sequence modeling problem. In Neural Information Processing Systems (NeurIPS), 2021.
- Efficient planning in a compact latent action space. In International Conference on Learning Representations (ICLR), 2023.
- Kaelbling, L. P. Learning to achieve goals. In International Joint Conference on Artificial Intelligence (IJCAI), 1993.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
- Segment anything. In IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Deep laplacian-based options for temporally-extended exploration. In International Conference on Machine Learning (ICML), 2023.
- Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations (ICLR), 2022.
- Conservative q-learning for offline reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2020.
- Guaranteed discovery of control-endogenous latent states with multi-step inverse models. Transactions on Machine Learning Research (TMLR), 2022.
- Urlb: Unsupervised reinforcement learning benchmark. In Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2021.
- Lee, J. M. Introduction to Riemannian manifolds. Springer, 2018.
- Masked autoencoding for scalable and generalizable decision making. In Neural Information Processing Systems (NeurIPS), 2022.
- APS: Active pretraining with successor features. In International Conference on Machine Learning (ICML), 2021a.
- Behavior from the void: Unsupervised active pre-training. In Neural Information Processing Systems (NeurIPS), 2021b.
- Universal successor features for transfer reinforcement learning. ArXiv, abs/2001.04025, 2020.
- How far i’ll go: Offline goal-conditioned reinforcement learning via f-advantage regression. In Neural Information Processing Systems (NeurIPS), 2022.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. In International Conference on Learning Representations (ICLR), 2023.
- A laplacian framework for option discovery in reinforcement learning. In International Conference on Machine Learning (ICML), 2017.
- Discovering and achieving goals via world models. In Neural Information Processing Systems (NeurIPS), 2021.
- Playing atari with deep reinforcement learning. ArXiv, abs/1312.5602, 2013.
- R3m: A universal visual representation for robot manipulation. In Conference on Robot Learning (CoRL), 2022.
- Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning. In Neural Information Processing Systems (NeurIPS), 2023.
- Asymmetric least squares estimation and testing. Econometrica, 55:819–847, 1987.
- Training language models to follow instructions with human feedback. In Neural Information Processing Systems (NeurIPS), 2022.
- Open x-embodiment: Robotic learning datasets and rt-x models. In IEEE International Conference on Robotics and Automation (ICRA), 2024.
- The unsurprising effectiveness of pre-trained vision models for control. In International Conference on Machine Learning (ICML), 2022.
- Hiql: Offline goal-conditioned rl with latent states as actions. In Neural Information Processing Systems (NeurIPS), 2023.
- Metra: Scalable unsupervised rl with metric-aware abstraction. In International Conference on Learning Representations (ICLR), 2024.
- Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning (ICML), 2017.
- Self-supervised exploration via disagreement. In International Conference on Machine Learning (ICML), 2019.
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. ArXiv, abs/1910.00177, 2019.
- Accelerating reinforcement learning with learned skill priors. In Conference on Robot Learning (CoRL), 2020.
- An inductive bias for distances: Neural nets that respect the triangle inequality. In International Conference on Learning Representations (ICLR), 2020.
- Mastering the unsupervised reinforcement learning benchmark from pixels. In International Conference on Machine Learning (ICML), 2023.
- Hierarchical text-conditional image generation with clip latents. ArXiv, abs/2204.06125, 2022.
- A generalist agent. Transactions on Machine Learning Research (TMLR), 2022.
- Reinforcement learning with action-free pre-training from videos. In International Conference on Machine Learning (ICML), 2022.
- Time-contrastive networks: Self-supervised learning from video. In IEEE International Conference on Robotics and Automation (ICRA), 2018.
- Rrl: Resnet as representation for reinforcement learning. In International Conference on Machine Learning (ICML), 2021.
- Dynamics-aware unsupervised discovery of skills. In International Conference on Learning Representations (ICLR), 2020.
- Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 16:285–286, 2005.
- Deepmind control suite. ArXiv, abs/1801.00690, 2018.
- Learning one representation to optimize all rewards. In Neural Information Processing Systems (NeurIPS), 2021.
- Does zero-shot reinforcement learning exist? In International Conference on Learning Representations (ICLR), 2023.
- Optimal goal-reaching reinforcement learning via quasimetric learning. In International Conference on Machine Learning (ICML), 2023.
- Masked trajectory models for prediction, representation, and control. In International Conference on Machine Learning (ICML), 2023.
- The laplacian in rl: Learning representations with efficient approximations. In International Conference on Learning Representations (ICLR), 2019.
- Masked visual pre-training for motor control. ArXiv, abs/2203.06173, 2022.
- What is essential for unseen goal generalization of offline goal-conditioned rl? In International Conference on Machine Learning (ICML), 2023.
- Reinforcement learning with prototypical representations. In International Conference on Machine Learning (ICML), 2021.
- Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning. ArXiv, abs/2201.13425, 2022.
- Learning state representations from random deep action-conditional predictions. In Neural Information Processing Systems (NeurIPS), 2021.
- Seohong Park (18 papers)
- Tobias Kreiman (6 papers)
- Sergey Levine (531 papers)