Offline-to-online Reinforcement Learning for Image-based Grasping with Scarce Demonstrations (2410.14957v2)
Abstract: Offline-to-online reinforcement learning (O2O RL) aims to obtain a continually improving policy as it interacts with the environment, while ensuring the initial policy behaviour is satisficing. This satisficing behaviour is necessary for robotic manipulation where random exploration can be costly due to catastrophic failures and time. O2O RL is especially compelling when we can only obtain a scarce amount of (potentially suboptimal) demonstrations$\unicode{x2014}$a scenario where behavioural cloning (BC) is known to suffer from distribution shift. Previous works have outlined the challenges in applying O2O RL algorithms under the image-based environments. In this work, we propose a novel O2O RL algorithm that can learn in a real-life image-based robotic vacuum grasping task with a small number of demonstrations where BC fails majority of the time. The proposed algorithm replaces the target network in off-policy actor-critic algorithms with a regularization technique inspired by neural tangent kernel. We demonstrate that the proposed algorithm can reach above 90\% success rate in under two hours of interaction time, with only 50 human demonstrations, while BC and existing commonly-used RL algorithms fail to achieve similar performance.
- Learning from guided play: Improving exploration for adversarial imitation learning with simple auxiliary tasks. In IEEE Robotics and Automation Letters, volume 8, pp. 1263–1270, 2023.
- Value-penalized auxiliary control from examples for learning without rewards or demonstrations. arXiv preprint arXiv:2407.03311, 2024.
- Deep reinforcement learning at the edge of the statistical precipice. In Advances in Neural Information Processing Systems, volume 34, pp. 29304–29320, 2021.
- Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In Machine learning proceedings 1995, pp. 30–37. Elsevier, 1995.
- Efficient online reinforcement learning with offline data. In International Conference on Machine Learning (ICML), volume 202, pp. 1577–1594, 2023.
- Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity. In The Twelfth International Conference on Learning Representations (ICLR), 2024.
- Stabilizing off-policy deep reinforcement learning from pixels. In International Conference on Machine Learning (ICML), volume 162, pp. 2784–2810, 2022.
- A statistical guarantee for representation transfer in multitask imitation learning. arXiv preprint arXiv:2311.01589, 2023.
- Randomized ensembled double q-learning: Learning fast without a model. In The Ninth International Conference on Learning Representations (ICLR), 2021.
- On lazy training in differentiable programming. 32, 2019.
- A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pp. 20132–20145, 2021.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning (ICML), volume 80, pp. 1587–1596, 2018.
- Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning (ICML), volume 97, pp. 2052–2062, 2019.
- A comprehensive survey on safe reinforcement learning. In Journal of Machine Learning Research, volume 16, pp. 1437–1480, 2015.
- Reset-free reinforcement learning via multi-task learning: Learning dexterous manipulation behaviors without human intervention. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 6664–6671, 2021.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML), volume 80, pp. 1861–1870, 2018.
- Baku: An efficient transformer for multi-task policy learning. In Advances in Neural Information Processing Systems (NeurIPS), 2024.
- A survey on deep reinforcement learning algorithms for robotic manipulation. Sensors, 23(7):3762, 2023.
- On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. volume 202, pp. 12511–12526, 2023.
- Hado Hasselt. Double q-learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 23, 2010.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
- Adaptive regularization of representation rank as an implicit constraint of bellman equation. In The Twelfth International Conference on Learning Representations (ICLR), 2024.
- Simple sensor intentions for exploration. arXiv preprint arXiv:2005.07541, 2020.
- Dropout q-functions for doubly efficient reinforcement learning. In The Tenth International Conference on Learning Representations (ICLR), 2022.
- Revisiting data augmentation in deep reinforcement learning. In The Twelfth International Conference on Learning Representations (ICLR), 2024.
- For pre-trained vision models in motor control, not all policy learning methods are created equal. In International Conference on Machine Learning (ICML), volume 202, pp. 13628–13651, 2023.
- Non-adaptive online finetuning for offline reinforcement learning. Reinforcement Learning Journal, 1, 2024.
- Sergey Ioffe. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
- Neural tangent kernel: Convergence and generalization in neural networks. 31, 2018.
- Ace: Off-policy actor-critic with causality-aware entropy regularization. In International Conference on Machine Learning (ICML), 2024.
- Diederik P Kingma. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Conservative q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp. 1179–1191, 2020.
- Dr3: Value-based deep reinforcement learning requires explicit regularization. In The Tenth International Conference on Learning Representations (ICLR), 2022.
- Mastering stacking of diverse shapes with large-scale iterative reinforcement learning on real robots. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 7772–7779, 2024.
- Reinforcement learning with augmented data. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp. 19884–19895, 2020.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Proto: Iterative policy regularized offline-to-online reinforcement learning. arXiv preprint arXiv:2305.15669, 2023.
- Does self-supervised learning really improve reinforcement learning from pixels? In Advances in Neural Information Processing Systems (NeurIPS), volume 35, pp. 30865–30881, 2022.
- Challenges and opportunities in offline reinforcement learning from visual observations. Transactions on Machine Learning Research, 2023.
- Serl: A software suite for sample-efficient robotic reinforcement learning. arXiv preprint arXiv:2401.16013, 2024.
- Reining generalization in offline reinforcement learning via representation distinction. In Advances in Neural Information Processing Systems (NeurIPS), volume 36, pp. 40773–40785, 2023.
- Benchmarking reinforcement learning algorithms on real-world robots. In Conference on Robot Learning (CoRL), volume 87, pp. 561–591, 2018.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Ordering-based conditions for global convergence of policy gradient methods. In Advances in Neural Information Processing Systems (NeurIPS), volume 36, pp. 30738–30749, 2023.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- A first-occupancy representation for reinforcement learning. In The Tenth International Conference on Learning Representations (ICLR), 2022.
- Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning. In Advances in Neural Information Processing Systems (NeurIPS), volume 36, pp. 62244–62269, 2023.
- Self-imitation learning. In International Conference on Machine Learning (ICML), volume 80, pp. 3878–3887, 2018.
- Open x-embodiment: Robotic learning datasets and rt-x models. arXiv preprint arXiv:2310.08864, 2023.
- The unsurprising effectiveness of pre-trained vision models for control. In International Conference on Machine Learning (ICML), volume 162, pp. 17359–17371, 2022.
- Foundation policies with hilbert representations. In International Conference on Machine Learning (ICML), 2024.
- The intrinsic dimension of images and its impact on learning. In The Ninth International Conference on Learning Representations (ICLR), 2021.
- D5RL: Diverse datasets for data-driven deep reinforcement learning. Reinforcement Learning Journal, 5:2178–2197, 2024.
- Toward the fundamental limits of imitation learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp. 2914–2924, 2020.
- Sqil: Imitation learning via reinforcement learning with sparse rewards. In The Eighth International Conference on Learning Representations (ICLR), 2020.
- Generalization guarantees for imitation learning. In Conference on Robot Learning (CoRL), volume 155, pp. 1426–1442, 2021.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635, 2011.
- Continuous control with coarse-to-fine reinforcement learning. In Conference on Robot Learning (CoRL), 2024.
- Cog: Connecting new skills to past experience with offline reinforcement learning. arXiv preprint arXiv:2010.14500, 2020.
- Hybrid rl: Using both offline and online data can make rl efficient. 2023.
- Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018.
- A natural extension to online algorithms for hybrid RL with limited coverage. Reinforcement Learning Journal, 3:1252–1264, 2024.
- Deep reinforcement learning for robotics: A survey of real-world successes. arXiv preprint arXiv:2408.03539, 2024.
- Improving deep reinforcement learning by reducing the chain effect of value and policy churn. In Advances in Neural Information Processing Systems (NeurIPS), 2024.
- Equivariant offline reinforcement learning. arXiv preprint arXiv:2406.13961, 2024.
- Vrl3: A data-driven framework for visual deep reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, pp. 32974–32988, 2022.
- Real-time reinforcement learning for vision-based robotics utilizing local and remote computers. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9435–9441, 2023.
- Error bounds of imitating policies and environments. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp. 15737–15749, 2020.
- Overcoming the spectral bias of neural value approximation. In The Tenth International Conference on Learning Representations (ICLR), 2022.
- Safe reinforcement learning with natural language constraints. Advances in Neural Information Processing Systems, 34:13794–13808, 2021.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In The Ninth International Conference on Learning Representations (ICLR), 2021.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. In The Tenth International Conference on Learning Representations (ICLR), 2022.
- Mopo: Model-based offline policy optimization. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp. 14129–14142, 2020.
- Understanding, predicting and better resolving q-value divergence in offline-rl. In Advances in Neural Information Processing Systems (NeurIPS), volume 36, pp. 60247–60277, 2023.
- Policy expansion for bridging offline-to-online reinforcement learning. In The Tenth International Conference on Learning Representations (ICLR), 2022.
- Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. In Proceedings of Robotics: Science and Systems (RSS), July .
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.