Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RoboCLIP: One Demonstration is Enough to Learn Robot Policies (2310.07899v1)

Published 11 Oct 2023 in cs.AI and cs.RO

Abstract: Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing expert demonstrations but typically require a large number of in-domain expert demonstrations. Inspired by advances in the field of Video-and-LLMs (VLMs), we present RoboCLIP, an online imitation learning method that uses a single demonstration (overcoming the large data requirement) in the form of a video demonstration or a textual description of the task to generate rewards without manual reward function design. Additionally, RoboCLIP can also utilize out-of-domain demonstrations, like videos of humans solving the task for reward generation, circumventing the need to have the same demonstration and deployment domains. RoboCLIP utilizes pretrained VLMs without any finetuning for reward generation. Reinforcement learning agents trained with RoboCLIP rewards demonstrate 2-3 times higher zero-shot performance than competing imitation learning methods on downstream robot manipulation tasks, doing so using only one video/text demonstration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 1, New York, NY, USA, 2004. Association for Computing Machinery. ISBN 1581138385. doi: 10.1145/1015330.1015430.
  2. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
  3. Asking easy questions: A user-friendly approach to active reward learning. In Proceedings of the 3rd Conference on Robot Learning (CoRL), 2019.
  4. Active preference-based gaussian process regression for reward learning. In Proceedings of Robotics: Science and Systems (RSS), July 2020. doi: 10.15607/rss.2020.xvi.041.
  5. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research, 41(1):45–67, 2022.
  6. Rt-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817, 2022.
  7. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 783–792, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
  8. Learning generalizable robotic reward functions from "in-the-wild" human videos, 2021.
  9. Deep reinforcement learning from human preferences, 2023.
  10. Robonet: Large-scale multi-robot learning. arXiv preprint arXiv:1910.11215, 2019.
  11. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. arXiv preprint arXiv:2109.13396, 2021.
  12. Minedojo: Building open-ended embodied agents with internet-scale knowledge. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  13. Guided cost learning: Deep inverse optimal control via policy optimization. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 49–58. JMLR.org, 2016.
  14. Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
  15. Variational inverse control with events: A general framework for data-driven reward definition. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  16. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  17. Policy adaptation from foundation model feedback, 2023.
  18. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18995–19012, 2022.
  19. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. arXiv preprint arXiv:1910.11956, 2019.
  20. Inverse reward design. Advances in neural information processing systems, 30, 2017.
  21. Teach a robot to fish: Versatile imitation from one minute of demonstrations. arXiv preprint arXiv:2303.01497, 2023.
  22. Few-shot preference learning for human-in-the-loop rl. In Proceedings of the 6th Conference on Robot Learning (CoRL), 2022.
  23. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
  24. Language instructed reinforcement learning for human-ai coordination. In 40th International Conference on Machine Learning (ICML), 2023.
  25. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, 32, 2019.
  26. When should we prefer offline reinforcement learning over behavioral cloning? arXiv preprint arXiv:2204.05618, 2022.
  27. Reward design with language models. In International Conference on Learning Representations (ICLR), 2023.
  28. Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training, 2021.
  29. Interactive language: Talking to robots in real time, 2022.
  30. Howto100m: Learning a text-video embedding by watching hundred million narrated video clips. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2630–2640, 2019.
  31. Learning multimodal rewards from rankings. In 5th Annual Conference on Robot Learning, 2021.
  32. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, page 663–670, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. ISBN 1558607072.
  33. Expanding language-image pretrained models for general video recognition. In European Conference on Computer Vision (ECCV), 2022.
  34. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. arXiv preprint arXiv:2301.12050, 2023.
  35. Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
  36. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  37. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  38. Active preference-based learning of reward functions. In Proceedings of Robotics: Science and Systems (RSS), July 2017. doi: 10.15607/RSS.2017.XIII.053.
  39. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  40. State entropy maximization with random encoders for efficient exploration. In International Conference on Machine Learning, pages 9443–9454. PMLR, 2021.
  41. Reinforcement learning: An introduction. MIT press, 2018.
  42. Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9446–9454, 2018.
  43. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European conference on computer vision (ECCV), pages 305–321, 2018.
  44. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020.
  45. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, AAAI’08, page 1433–1438. AAAI Press, 2008. ISBN 9781577353683.
Citations (43)

Summary

We haven't generated a summary for this paper yet.