RESPRECT: Speeding-up Multi-fingered Grasping with Residual Reinforcement Learning (2401.14858v1)
Abstract: Deep Reinforcement Learning (DRL) has proven effective in learning control policies using robotic grippers, but much less practical for solving the problem of grasping with dexterous hands -- especially on real robotic platforms -- due to the high dimensionality of the problem. In this work, we focus on the multi-fingered grasping task with the anthropomorphic hand of the iCub humanoid. We propose the RESidual learning with PREtrained CriTics (RESPRECT) method that, starting from a policy pre-trained on a large set of objects, can learn a residual policy to grasp a novel object in a fraction ($\sim 5 \times$ faster) of the timesteps required to train a policy from scratch, without requiring any task demonstration. To our knowledge, this is the first Residual Reinforcement Learning (RRL) approach that learns a residual policy on top of another policy pre-trained with DRL. We exploit some components of the pre-trained policy during residual learning that further speed-up the training. We benchmark our results in the iCub simulated environment, and we show that RESPRECT can be effectively used to learn a multi-fingered grasping policy on the real iCub robot. The code to reproduce the experiments is released together with the paper with an open source license.
- Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
- A system for general in-hand object re-orientation. In Conference on Robot Learning, pages 297–307. PMLR, 2022.
- Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Dexpoint: Generalizable point cloud reinforcement learning for sim-to-real dexterous manipulation. In Conference on Robot Learning, pages 594–605. PMLR, 2023.
- The iCub humanoid robot: an open-systems platform for research in cognitive development. Neural networks : the official journal of the International Neural Network Society, 23(8-9):1125–34, 1 2010.
- A grasp pose is all you need: Learning multi-fingered grasping with deep reinforcement learning from vision and touch. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2985–2992, 2023.
- A grasping approach based on superquadric models. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1579–1586, 2017.
- Toward human-like grasp: Functional grasp by dexterous robotic hand via object-hand semantic representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–14, 2023.
- Efficientgrasp: A unified data-efficient learning to grasp method for multi-fingered robot hands. IEEE Robotics and Automation Letters, 7(4):8619–8626, 2022.
- Multifingered grasping based on multimodal reinforcement learning. IEEE Robotics and Automation Letters, 7(2):1174–1181, 2021.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
- Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6292–6299. IEEE, 2018.
- Continuous control with deep reinforcement learning. 2016.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania, June 2018.
- Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
- On the effectiveness of fine-tuning versus meta-reinforcement learning. Advances in Neural Information Processing Systems, 35:26519–26531, 2022.
- Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International conference on machine learning, pages 5331–5340. PMLR, 2019.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
- Residual policy learning. arXiv preprint arXiv:1812.06298, 2018.
- Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA), pages 6023–6029. IEEE, 2019.
- Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5548–5555. IEEE, 2020.
- Residual feedback learning for contact-rich manipulation tasks with uncertainty. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2383–2390. IEEE, 2021.
- Residual learning from demonstration: Adapting dmps for contact-rich manipulation. IEEE Robotics and Automation Letters, 7(2):4488–4495, 2022.
- Residual reinforcement learning from demonstrations. arXiv preprint arXiv:2106.08050, 2021.
- Tossingbot: Learning to throw arbitrary objects with residual physics. IEEE Transactions on Robotics, 36(4):1307–1319, 2020.
- Residual reactive navigation: Combining classical and learned navigation strategies for deployment in unknown environments. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 11493–11499, 2020.
- Volumetric grasping network: Real-time 6 dof grasp detection in clutter. In Conference on Robot Learning, 2020.
- Reinforcement learning with latent flow. Advances in Neural Information Processing Systems, 34:22171–22183, 2021.
- Real-world robot learning with masked visual pre-training. In Conference on Robot Learning, pages 416–426. PMLR, 2023.
- Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18995–19012, 2022.
- Google scanned objects: A high-quality dataset of 3d scanned household items, 2022.
- Kevin Zakka. Scanned Objects MuJoCo Models, 7 2022.
- Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. 2018.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- YARP: Yet another robot platform. International Journal of Advanced Robotics Systems, 3(1):43–48, 2006.
- An experimental evaluation of a novel minimum-jerk cartesian controller for humanoid robots. In 2010 IEEE/RSJ international conference on intelligent robots and systems, pages 1668–1674. IEEE, 2010.
- On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
- Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020.