Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning (2410.21845v2)
Abstract: Reinforcement learning (RL) holds great promise for enabling autonomous acquisition of complex robotic manipulation skills, but realizing this potential in real-world settings has been challenging. We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks, including dynamic manipulation, precision assembly, and dual-arm coordination. Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies that achieve near-perfect success rates and fast cycle times within just 1 to 2.5 hours of training. We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution. Through extensive experiments and analysis, we provide insights into the effectiveness of our approach, demonstrating how it learns robust, adaptive policies for both reactive and predictive control strategies. Our results suggest that RL can indeed learn a wide range of complex vision-based manipulation policies directly in the real world within practical training times. We hope this work will inspire a new generation of learned robotic manipulation techniques, benefiting both industrial applications and research advancements. Videos and code are available at our project website https://hil-serl.github.io/.
- B. Aceituno-Cabezas and A. Rodriguez. A global quasi-dynamic model for contact-trajectory optimization. In RSS, 2020.
- Feedback Systems: An Introduction for Scientists and Engineers. Princeton University Press, USA, 2008. ISBN 0691135762.
- On the sample complexity of reinforcement learning with a generative model, 2012. URL https://arxiv.org/abs/1206.6461.
- Efficient online reinforcement learning with offline data. arXiv preprint arXiv:2302.02948, 2023.
- A finite time analysis of temporal difference learning with linear function approximation. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 1691–1692. PMLR, 06–09 Jul 2018.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control, 2023a. URL https://arxiv.org/abs/2307.15818.
- Rt-1: Robotics transformer for real-world control at scale, 2023b. URL https://arxiv.org/abs/2212.06817.
- Sequential composition of dynamically dexterous robot behaviors. The International Journal of Robotics Research, 18(6):534–555, 1999. 10.1177/02783649922066385.
- Insert-one: One-shot robust visual-force servoing for novel object insertion with 6-dof tracking. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), October 2024. URL https://www.merl.com/publications/TR2024-137.
- Path integral guided policy search, 2016. URL https://arxiv.org/abs/1610.00529.
- Visual dexterity: In-hand reorientation of novel and complex object shapes. Science Robotics, 8(84):eadc9244, 2023. 10.1126/scirobotics.adc9244. URL https://www.science.org/doi/abs/10.1126/scirobotics.adc9244.
- Diffusion policy: Visuomotor policy learning via action diffusion, 2024. URL https://arxiv.org/abs/2303.04137.
- Open X-Embodiment: Robotic learning datasets and RT-X models. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 2024.
- Jinda Cui and J. Trinkle. Toward next-generation learned robot manipulation. Science Robotics, 6, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 10.1109/CVPR.2009.5206848.
- An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URL https://arxiv.org/abs/2010.11929.
- Is a good representation sufficient for sample efficient reinforcement learning?, 2020. URL https://arxiv.org/abs/1910.03016.
- Vision-language models as success detectors. arXiv preprint arXiv:2303.07280, 2023.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/74a67268c5cc5910f64938cac4526a90-Abstract-Datasets_and_Benchmarks.html.
- See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion. Science Robotics, 4(26):eaav3123, 2019. 10.1126/scirobotics.aav3123. URL https://www.science.org/doi/abs/10.1126/scirobotics.aav3123.
- Reverse curriculum generation for reinforcement learning. In Sergey Levine, Vincent Vanhoucke, and Ken Goldberg, editors, Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings of Machine Learning Research, pages 482–495. PMLR, 13–15 Nov 2017.
- Automatic goal generation for reinforcement learning agents, 2018. URL https://arxiv.org/abs/1705.06366.
- Variational inverse control with events: A general framework for data-driven reward definition. Advances in neural information processing systems, 31, 2018.
- A review of the high-mix, low-volume manufacturing industry. Applied Sciences, 13(3), 2023. ISSN 2076-3417. 10.3390/app13031687. URL https://www.mdpi.com/2076-3417/13/3/1687.
- Reset-free reinforcement learning via multi-task learning: Learning dexterous manipulation behaviors without human intervention. In IEEE International Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021, pages 6664–6671. IEEE, 2021. 10.1109/ICRA48506.2021.9561384. URL https://doi.org/10.1109/ICRA48506.2021.9561384.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Deep residual learning for image recognition, 2015.
- Texplore: real-time sample-efficient reinforcement learning for robots. Machine learning, 90:385–429, 2013.
- Feedback control of the pusher-slider system: A story of hybrid and underactuated contact dynamics, 2016. URL https://arxiv.org/abs/1611.08268.
- Imitation bootstrapped reinforcement learning, 2024a. URL https://arxiv.org/abs/2311.02198.
- REBOOT: reuse data for bootstrapping efficient real-world dexterous manipulation. arXiv preprint arXiv:2309.03322, 2024b.
- Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019. 10.1126/scirobotics.aau5872. URL https://www.science.org/doi/abs/10.1126/scirobotics.aau5872.
- Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, 25(2):328–373, 2013. 10.1162/NECO_a_00393.
- Is q-learning provably efficient? In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/d3b1fb02964aa64e257f9f26a31f72cf-Paper.pdf.
- Provably efficient reinforcement learning with linear function approximation. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 2137–2143. PMLR, 09–12 Jul 2020. URL https://proceedings.mlr.press/v125/jin20a.html.
- Robust deformation model approximation for robotic cable manipulation. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6586–6593, 2019. 10.1109/IROS40897.2019.8968157.
- Contact pose identification for peg-in-hole assembly under uncertainties. In 2021 American Control Conference (ACC), pages 48–53, 2021. 10.23919/ACC50511.2021.9482981.
- Applying lean principles for high product variety and low volumes: some issues and propositions. Logistics Information Management, 10:5–13, 1997. URL https://api.semanticscholar.org/CorpusID:110348308.
- Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA), pages 6023–6029, 2019. 10.1109/ICRA.2019.8794127.
- Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation, 2018. URL https://arxiv.org/abs/1806.10293.
- Mt-opt: Continuous multi-task robotic reinforcement learning at scale, 2021. URL https://arxiv.org/abs/2104.08212.
- Near-optimal reinforcement learning in polynominal time. In Proceedings of the Fifteenth International Conference on Machine Learning, ICML ’98, page 260–268, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. ISBN 1558605568.
- Hg-dagger: Interactive imitation learning with human experts. 2019 International Conference on Robotics and Automation (ICRA), pages 8077–8083, 2018. URL https://api.semanticscholar.org/CorpusID:52939433.
- Openvla: An open-source vision-language-action model, 2024. URL https://arxiv.org/abs/2406.09246.
- Benchmarking protocols for evaluating small parts robotic assembly systems. IEEE Robotics and Automation Letters, 5(2):883–889, 2020. 10.1109/LRA.2020.2965869.
- Big transfer (bit): General visual representation learning, 2020. URL https://arxiv.org/abs/1912.11370.
- Robot motor skill coordination with em-based reinforcement learning. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3232–3237, 2010. 10.1109/IROS.2010.5649089.
- Demonstrating A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. In Kostas E. Bekris, Kris Hauser, Sylvia L. Herbert, and Jingjin Yu, editors, Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, 2023. 10.15607/RSS.2023.XIX.056. URL https://doi.org/10.15607/RSS.2023.XIX.056.
- Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5(47):eabc5986, 2020. 10.1126/scirobotics.abc5986. URL https://www.science.org/doi/abs/10.1126/scirobotics.abc5986.
- End-to-end training of deep visuomotor policies. J. Mach. Learn. Res., 17:39:1–39:40, 2016. URL http://jmlr.org/papers/v17/15-522.html.
- MURAL: meta-learning uncertainty-aware rewards for outcome-driven reinforcement learning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 6346–6356. PMLR, 2021. URL http://proceedings.mlr.press/v139/li21g.html.
- Learning high-speed flight in the wild. Science Robotics, 6(59):eabg5810, 2021. 10.1126/scirobotics.abg5810. URL https://www.science.org/doi/abs/10.1126/scirobotics.abg5810.
- Jianlan Luo (22 papers)
- Charles Xu (12 papers)
- Jeffrey Wu (8 papers)
- Sergey Levine (531 papers)