Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings (2402.17135v1)
Abstract: Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. Code for this project is provided at: https://github.com/kvfrans/fre
- fđť‘“fitalic_f-policy gradients: A general framework for goal conditioned rl using fđť‘“fitalic_f-divergences. arXiv preprint arXiv:2310.06794, 2023.
- Opal: Offline primitive discovery for accelerating offline reinforcement learning. arXiv preprint arXiv:2010.13611, 2020.
- Deep variational information bottleneck. arXiv preprint arXiv:1612.00410, 2016.
- Modular multitask reinforcement learning with policy sketches. In International conference on machine learning, pp. 166–175. PMLR, 2017.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Universal successor features approximators. arXiv preprint arXiv:1812.07626, 2018.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Caruana, R. Multitask learning. Machine learning, 28:41–75, 1997.
- Actionable models: Unsupervised offline reinforcement learning of robotic skills. arXiv preprint arXiv:2104.07749, 2021.
- Self-supervised reinforcement learning that transfers using random features. arXiv preprint arXiv:2305.17250, 2023.
- Dayan, P. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4):613–624, 1993.
- Offline meta reinforcement learning–identifiability challenges and effective data collection strategies. Advances in Neural Information Processing Systems, 34:4607–4618, 2021.
- Rl22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
- Adversarial intrinsic motivation for reinforcement learning. Advances in Neural Information Processing Systems, 34:8622–8636, 2021.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Contrastive learning as goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems, 35:35603–35620, 2022.
- Dher: Hindsight experience replay for dynamic goals. In International Conference on Learning Representations, 2018.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- Conditional neural processes. In International conference on machine learning, pp. 1704–1713. PMLR, 2018a.
- Neural processes. arXiv preprint arXiv:1807.01622, 2018b.
- Contextual markov decision processes. arXiv preprint arXiv:1502.02259, 2015.
- Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
- Unsupervised behavior extraction via random intent priors. arXiv preprint arXiv:2310.18687, 2023.
- Kaelbling, L. P. Learning to achieve goals. In IJCAI, volume 2, pp. 1094–8. Citeseer, 1993.
- Attentive neural processes. arXiv preprint arXiv:1901.05761, 2019.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
- Cic: Contrastive intrinsic control for unsupervised skill discovery. arXiv preprint arXiv:2202.00161, 2022.
- Learning multi-level hierarchies with hindsight. arXiv preprint arXiv:1712.00948, 2017.
- Generalized hindsight for reinforcement learning. Advances in neural information processing systems, 33:7754–7767, 2020a.
- Multi-task batch reinforcement learning with metric learning. Advances in Neural Information Processing Systems, 33:6197–6210, 2020b.
- Hierarchical planning through goal-conditioned offline reinforcement learning. IEEE Robotics and Automation Letters, 7(4):10216–10223, 2022.
- Focal: Efficient fully-offline meta-reinforcement learning via distance metric learning and behavior regularization. arXiv preprint arXiv:2010.01112, 2020c.
- Visual reinforcement learning with imagined goals. Advances in neural information processing systems, 31, 2018.
- Hiql: Offline goal-conditioned rl with latent states as actions. arXiv preprint arXiv:2307.11949, 2023a.
- Metra: Scalable unsupervised rl with metric-aware abstraction. arXiv preprint arXiv:2310.08887, 2023b.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp. 2778–2787. PMLR, 2017.
- Accelerating reinforcement learning with learned skill priors. In Conference on robot learning, pp. 188–204. PMLR, 2021.
- Offline meta-reinforcement learning with online self-supervision. In International Conference on Machine Learning, pp. 17811–17829. PMLR, 2022.
- Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International conference on machine learning, pp. 5331–5340. PMLR, 2019.
- Semi-parametric topological memory for navigation. arXiv preprint arXiv:1803.00653, 2018.
- Universal value function approximators. In International conference on machine learning, pp. 1312–1320. PMLR, 2015.
- Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657, 2019.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning, pp. 785–799. PMLR, 2023.
- Lancon-learn: Learning with language to enable generalization in multi-task manipulation. IEEE Robotics and Automation Letters, 7(2):1635–1642, 2021.
- Multi-task reinforcement learning with context-based representations. In International Conference on Machine Learning, pp. 9767–9779. PMLR, 2021.
- Learning more skills through optimistic exploration. arXiv preprint arXiv:2107.14226, 2021.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- The information bottleneck method. arXiv preprint physics/0004057, 2000.
- Learning one representation to optimize all rewards. Advances in Neural Information Processing Systems, 34:13–23, 2021.
- Does zero-shot reinforcement learning exist? arXiv preprint arXiv:2209.14935, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pp. 1096–1103, 2008.
- Optimal goal-reaching reinforcement learning via quasimetric learning. arXiv preprint arXiv:2304.01203, 2023.
- No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1):67–82, 1997.
- Rethinking goal-conditioned supervised learning and its connection to offline rl. arXiv preprint arXiv:2202.04478, 2022.
- Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning. arXiv preprint arXiv:2201.13425, 2022.
- Robust task representations for offline meta-reinforcement learning via contrastive learning. In International Conference on Machine Learning, pp. 25747–25759. PMLR, 2022.
- Kevin Frans (16 papers)
- Seohong Park (18 papers)
- Pieter Abbeel (372 papers)
- Sergey Levine (531 papers)