Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning (2305.15260v4)
Abstract: Training offline RL models using visual inputs poses two significant challenges, i.e., the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.
- An optimistic perspective on offline reinforcement learning. In ICML, pp. 104–114, 2020.
- Offline reinforcement learning via high-fidelity generative behavior modeling. In ICLR, 2023.
- Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning. In NeurIPS, volume 35, pp. 36902–36913, 2022.
- S2p: State-conditioned image synthesis for data augmentation in offline reinforcement learning. In NeurIPS, 2022.
- Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
- Robonet: Large-scale multi-robot learning. In CoRL, 2019.
- Off-dynamics reinforcement learning: Training for transfer with domain classifiers. In ICLR, 2021.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- A minimalist approach to offline reinforcement learning. In NeurIPS, 2021.
- Off-policy deep reinforcement learning without exploration. In ICML, pp. 2052–2062, 2019.
- Reinforcement learning from passive data via latent intentions. arXiv preprint arXiv:2304.04782, 2023.
- Learning latent dynamics for planning from pixels. In ICML, pp. 2555–2565, 2019.
- Dream to control: Learning behaviors by latent imagination. In ICLR, 2020.
- Mastering atari with discrete world models. In ICLR, 2021.
- Deep hierarchical planning from pixels. arXiv preprint arXiv:2206.04114, 2022.
- Robodesk: A multi-task reinforcement learning benchmark. https://github.com/google-research/robodesk, 2021.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Conservative q-learning for offline reinforcement learning. In NeurIPS, volume 33, pp. 1179–1191, 2020.
- Offline q-learning on diverse multi-task data both scales and generalizes. In ICLR, 2023.
- Curl: Contrastive unsupervised representations for reinforcement learning. In ICML, pp. 5639–5650, 2020.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Cross-domain random pre-training with prototypes for reinforcement learning. arXiv preprint arXiv:2302.05614, 2023.
- Challenges and opportunities in offline reinforcement learning from visual observations. Transactions on Machine Learning Research, 2023.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. In ICLR, 2023.
- Scaling robot supervision to hundreds of hours with roboturk: Robotic manipulation dataset through human reasoning and dexterity. In IROS, pp. 1048–1055, 2019.
- Choreographer: Learning and adapting skills in imagination. In ICLR, 2023.
- Transformers are sample efficient world models. In ICLR, 2023.
- Ctrlformer: Learning transferable state representation for visual control via transformer. In ICML, pp. 16043–16061, 2022.
- Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning. arXiv preprint arXiv:2303.05479, 2023.
- Iso-dream: Isolating and leveraging noncontrollable visual dynamics in world models. In NeurIPS, volume 35, pp. 23178–23191, 2022.
- The unsurprising effectiveness of pre-trained vision models for control. In ICML, pp. 17359–17371, 2022.
- Data-driven offline decision-making via invariant representation learning. In NeurIPS, 2022.
- Offline reinforcement learning from images with latent space models. In Proceedings of Machine Learning Research, pp. 1154–1168, 2021.
- MOTO: Offline pre-training to online fine-tuning for model-based robot learning. In 7th Annual Conference on Robot Learning, 2023. URL https://openreview.net/forum?id=6Um8P8Fvyhl.
- Rambo-rl: Robust adversarial model-based offline reinforcement learning. In NeurIPS, 2022.
- Pretraining representations for data-efficient reinforcement learning. In NeurIPS, volume 34, pp. 12686–12699, 2021.
- Planning to explore via self-supervised world models. In ICML, pp. 8583–8592, 2020.
- Reinforcement learning with action-free pre-training from videos. In ICML, pp. 19561–19579, 2022.
- Decoupling representation learning from reinforcement learning. In ICML, pp. 9870–9879, 2021.
- Temple: Learning template of transitions for sample efficient multi-task rl. In AAAI, volume 35, pp. 9765–9773, 2021.
- Transfer rl across observation feature spaces via model-based regularization. In ICLR, 2022.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
- Representation matters: offline pretraining for sequential decision making. In ICML, pp. 11784–11794, 2021.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Reward informed dreamer for task generalization in reinforcement learning. arXiv preprint arXiv:2303.05092, 2023.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In CoRL, 2019.
- Mopo: Model-based offline policy optimization. In NeurIPS, volume 33, pp. 14129–14142, 2020.
- Combo: Conservative offline model-based policy optimization. In NeurIPS, volume 34, pp. 28954–28967, 2021.
- How to leverage unlabeled data in offline reinforcement learning. In ICML, pp. 25611–25635, 2022.
- Behavior prior representation learning for offline reinforcement learning. In ICLR, 2023.
- Invariant causal prediction for block mdps. In ICML, pp. 11214–11224, 2020.
- Learning invariant representations for reinforcement learning without reconstruction. In ICLR, 2021.
- Predictive experience replay for continual visual control and forecasting. arXiv preprint arXiv:2303.06572, 2023.
- Transfer learning in deep reinforcement learning: A survey. arXiv preprint arXiv:2009.07888, 2020.
- Behavior proximal policy optimization. In ICLR, 2023.