Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Foundation Policies with Hilbert Representations (2402.15567v2)

Published 23 Feb 2024 in cs.LG, cs.AI, and cs.RO

Abstract: Unsupervised and self-supervised objectives, such as next token prediction, have enabled pre-training generalist models from large amounts of unlabeled data. In reinforcement learning (RL), however, finding a truly general and scalable unsupervised pre-training objective for generalist policies from offline data remains a major open question. While a number of methods have been proposed to enable generic self-supervised RL, based on principles such as goal-conditioned RL, behavioral cloning, and unsupervised skill learning, such methods remain limited in terms of either the diversity of the discovered behaviors, the need for high-quality demonstration data, or the lack of a clear adaptation mechanism for downstream tasks. In this work, we propose a novel unsupervised framework to pre-train generalist policies that capture diverse, optimal, long-horizon behaviors from unlabeled offline data such that they can be quickly adapted to any arbitrary new tasks in a zero-shot manner. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment, and then to span this learned latent space with directional movements, which enables various zero-shot policy "prompting" schemes for downstream tasks. Through our experiments on simulated robotic locomotion and manipulation benchmarks, we show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion, even often outperforming prior methods designed specifically for each setting. Our code and videos are available at https://seohong.me/projects/hilp/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Deep reinforcement learning at the edge of the statistical precipice. In Neural Information Processing Systems (NeurIPS), 2021.
  2. Opal: Offline primitive discovery for accelerating offline reinforcement learning. In International Conference on Learning Representations (ICLR), 2021.
  3. Hindsight experience replay. In Neural Information Processing Systems (NeurIPS), 2017.
  4. Layer normalization. ArXiv, abs/1607.06450, 2016.
  5. Successor features for transfer in reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2017.
  6. Universal successor features approximators. In International Conference on Learning Representations (ICLR), 2019.
  7. Inverse dynamics pretraining learns good representations for multitask imitation. In Neural Information Processing Systems (NeurIPS), 2023.
  8. Language models are few-shot learners. In Neural Information Processing Systems (NeurIPS), 2020.
  9. Exploration by random network distillation. In International Conference on Learning Representations (ICLR), 2019.
  10. Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning (ICML), 2021.
  11. Actionable models: Unsupervised offline reinforcement learning of robotic skills. In International Conference on Machine Learning (ICML), 2021.
  12. Self-supervised reinforcement learning that transfers using random features. In Neural Information Processing Systems (NeurIPS), 2023.
  13. Decision transformer: Reinforcement learning via sequence modeling. In Neural Information Processing Systems (NeurIPS), 2021a.
  14. Evaluating large language models trained on code. ArXiv, abs/2107.03374, 2021b.
  15. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
  16. Dayan, P. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5:613–624, 1993.
  17. Goal-conditioned imitation learning. In Neural Information Processing Systems (NeurIPS), 2019.
  18. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations (ICLR), 2019.
  19. Contrastive learning as goal-conditioned reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2022.
  20. Proto-value networks: Scaling representation learning with auxiliary tasks. In International Conference on Learning Representations (ICLR), 2023.
  21. D4rl: Datasets for deep data-driven reinforcement learning. ArXiv, abs/2004.07219, 2020.
  22. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning (ICML), 2018.
  23. Geng, X. Jaxcql: a simple implementation of sac and cql in jax, 2022. URL https://github.com/young-geng/JaxCQL.
  24. Learning to reach goals via iterated supervised learning. In International Conference on Learning Representations (ICLR), 2021.
  25. Reinforcement learning from passive data via latent intentions. In International Conference on Machine Learning (ICML), 2023.
  26. Variational intrinsic control. ArXiv, abs/1611.07507, 2016.
  27. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Conference on Robot Learning (CoRL), 2019.
  28. Unsupervised behavior extraction via random intent priors. In Neural Information Processing Systems (NeurIPS), 2023.
  29. Low-distortion embeddings of finite metric spaces. In Handbook of discrete and computational geometry, pp. 211–231. Chapman and Hall/CRC, 2017.
  30. Reinforcement learning as one big sequence modeling problem. In Neural Information Processing Systems (NeurIPS), 2021.
  31. Efficient planning in a compact latent action space. In International Conference on Learning Representations (ICLR), 2023.
  32. Kaelbling, L. P. Learning to achieve goals. In International Joint Conference on Artificial Intelligence (IJCAI), 1993.
  33. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
  34. Segment anything. In IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  35. Deep laplacian-based options for temporally-extended exploration. In International Conference on Machine Learning (ICML), 2023.
  36. Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations (ICLR), 2022.
  37. Conservative q-learning for offline reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2020.
  38. Guaranteed discovery of control-endogenous latent states with multi-step inverse models. Transactions on Machine Learning Research (TMLR), 2022.
  39. Urlb: Unsupervised reinforcement learning benchmark. In Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2021.
  40. Lee, J. M. Introduction to Riemannian manifolds. Springer, 2018.
  41. Masked autoencoding for scalable and generalizable decision making. In Neural Information Processing Systems (NeurIPS), 2022.
  42. APS: Active pretraining with successor features. In International Conference on Machine Learning (ICML), 2021a.
  43. Behavior from the void: Unsupervised active pre-training. In Neural Information Processing Systems (NeurIPS), 2021b.
  44. Universal successor features for transfer reinforcement learning. ArXiv, abs/2001.04025, 2020.
  45. How far i’ll go: Offline goal-conditioned reinforcement learning via f-advantage regression. In Neural Information Processing Systems (NeurIPS), 2022.
  46. Vip: Towards universal visual reward and representation via value-implicit pre-training. In International Conference on Learning Representations (ICLR), 2023.
  47. A laplacian framework for option discovery in reinforcement learning. In International Conference on Machine Learning (ICML), 2017.
  48. Discovering and achieving goals via world models. In Neural Information Processing Systems (NeurIPS), 2021.
  49. Playing atari with deep reinforcement learning. ArXiv, abs/1312.5602, 2013.
  50. R3m: A universal visual representation for robot manipulation. In Conference on Robot Learning (CoRL), 2022.
  51. Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning. In Neural Information Processing Systems (NeurIPS), 2023.
  52. Asymmetric least squares estimation and testing. Econometrica, 55:819–847, 1987.
  53. Training language models to follow instructions with human feedback. In Neural Information Processing Systems (NeurIPS), 2022.
  54. Open x-embodiment: Robotic learning datasets and rt-x models. In IEEE International Conference on Robotics and Automation (ICRA), 2024.
  55. The unsurprising effectiveness of pre-trained vision models for control. In International Conference on Machine Learning (ICML), 2022.
  56. Hiql: Offline goal-conditioned rl with latent states as actions. In Neural Information Processing Systems (NeurIPS), 2023.
  57. Metra: Scalable unsupervised rl with metric-aware abstraction. In International Conference on Learning Representations (ICLR), 2024.
  58. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning (ICML), 2017.
  59. Self-supervised exploration via disagreement. In International Conference on Machine Learning (ICML), 2019.
  60. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. ArXiv, abs/1910.00177, 2019.
  61. Accelerating reinforcement learning with learned skill priors. In Conference on Robot Learning (CoRL), 2020.
  62. An inductive bias for distances: Neural nets that respect the triangle inequality. In International Conference on Learning Representations (ICLR), 2020.
  63. Mastering the unsupervised reinforcement learning benchmark from pixels. In International Conference on Machine Learning (ICML), 2023.
  64. Hierarchical text-conditional image generation with clip latents. ArXiv, abs/2204.06125, 2022.
  65. A generalist agent. Transactions on Machine Learning Research (TMLR), 2022.
  66. Reinforcement learning with action-free pre-training from videos. In International Conference on Machine Learning (ICML), 2022.
  67. Time-contrastive networks: Self-supervised learning from video. In IEEE International Conference on Robotics and Automation (ICRA), 2018.
  68. Rrl: Resnet as representation for reinforcement learning. In International Conference on Machine Learning (ICML), 2021.
  69. Dynamics-aware unsupervised discovery of skills. In International Conference on Learning Representations (ICLR), 2020.
  70. Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 16:285–286, 2005.
  71. Deepmind control suite. ArXiv, abs/1801.00690, 2018.
  72. Learning one representation to optimize all rewards. In Neural Information Processing Systems (NeurIPS), 2021.
  73. Does zero-shot reinforcement learning exist? In International Conference on Learning Representations (ICLR), 2023.
  74. Optimal goal-reaching reinforcement learning via quasimetric learning. In International Conference on Machine Learning (ICML), 2023.
  75. Masked trajectory models for prediction, representation, and control. In International Conference on Machine Learning (ICML), 2023.
  76. The laplacian in rl: Learning representations with efficient approximations. In International Conference on Learning Representations (ICLR), 2019.
  77. Masked visual pre-training for motor control. ArXiv, abs/2203.06173, 2022.
  78. What is essential for unseen goal generalization of offline goal-conditioned rl? In International Conference on Machine Learning (ICML), 2023.
  79. Reinforcement learning with prototypical representations. In International Conference on Machine Learning (ICML), 2021.
  80. Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning. ArXiv, abs/2201.13425, 2022.
  81. Learning state representations from random deep action-conditional predictions. In Neural Information Processing Systems (NeurIPS), 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Seohong Park (18 papers)
  2. Tobias Kreiman (6 papers)
  3. Sergey Levine (531 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com