AR2-D2:Training a Robot Without a Robot (2306.13818v1)
Abstract: Diligently gathered human demonstrations serve as the unsung heroes empowering the progression of robot learning. Today, demonstrations are collected by training people to use specialized controllers, which (tele-)operate robots to manipulate a small number of objects. By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot. AR2-D2 is a framework in the form of an iOS app that people can use to record a video of themselves manipulating any object while simultaneously capturing essential data modalities for training a real robot. We show that data collected via our system enables the training of behavior cloning agents in manipulating real objects. Our experiments further show that training with our AR data is as effective as training with real-world robot demonstrations. Moreover, our user study indicates that users find AR2-D2 intuitive to use and require no training in contrast to four other frequently employed methods for collecting robot demonstrations.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- A survey of embodied ai: From simulators to research tasks. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022.
- Robonet: Large-scale multi-robot learning. arXiv preprint arXiv:1910.11215, 2019.
- Bridge data: Boosting generalization of robotic skills with cross-domain datasets. arXiv preprint arXiv:2109.13396, 2021.
- Perceiver-actor: A multi-task transformer for robotic manipulation. arXiv preprint arXiv:2209.05451, 2022.
- Mimicplay: Long-horizon imitation learning by watching human play. arXiv preprint arXiv:2302.12422, 2023.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Vima: General robot manipulation with multimodal prompts. arXiv preprint arXiv:2210.03094, 2022.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
- Crowdsourcing for closed loop control. In Proc. of the NIPS Workshop on Computational Social Science and the Wisdom of Crowds, NIPS, pages 4–7, 2010.
- A comparison of remote robot teleoperation interfaces for general object manipulation. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, pages 371–379, 2017.
- Strategies for human-in-the-loop robotic grasping. In Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, pages 1–8, 2012.
- Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 358–365. IEEE, 2017.
- Online customization of teleoperation interfaces. In 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication, pages 919–924. IEEE, 2012.
- Novel interaction strategies for learning from teleoperation. In AAAI Fall Symposium: Robots Learning Interactively from Human Teachers, volume 12, page 07, 2012.
- Using dvrk teleoperation to facilitate deep learning of automation tasks for an industrial robot. In 2017 13th IEEE Conference on Automation Science and Engineering (CASE), pages 1–8. IEEE, 2017.
- Comparing robot grasping teleoperation across desktop and virtual reality with ros reality. In Robotics Research: The 18th International Symposium ISRR, pages 335–350. Springer, 2019.
- Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 5628–5635. IEEE, 2018.
- Baxter’s homunculus: Virtual reality spaces for teleoperation in manufacturing. IEEE Robotics and Automation Letters, 3(1):179–186, 2017.
- Orbit: A unified simulation framework for interactive robot learning environments. IEEE Robotics and Automation Letters, pages 1–8, 2023. doi:10.1109/LRA.2023.3270034.
- Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022.
- Designing user-centric programming aids for kinesthetic teaching of collaborative robots. Robotics and Autonomous Systems, 145:103845, 2021.
- Beyond the web: Excavating the real world via mosaic. In Second International WWW Conference, pages 1–12, 1994.
- V. J. Lumelsky and E. Cheung. Real-time collision avoidance in teleoperated whole-sensitive robot arm manipulators. IEEE Transactions on Systems, Man, and Cybernetics, 23(1):194–203, 1993.
- Bilateral teleoperation: An historical survey. Automatica, 42(12):2035–2057, 2006.
- Roboturk: A crowdsourcing platform for robotic skill learning through imitation. In Conference on Robot Learning, pages 879–893. PMLR, 2018.
- Perceiver: General perception with iterative attention, 2021. URL https://arxiv.org/abs/2103.03206.
- D. A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
- Imitating task and motion planning with visuomotor transformers, 2023.
- Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13739–13748, 2022.
- One-shot visual imitation learning via meta-learning. In Conference on robot learning, pages 357–368. PMLR, 2017.
- Robot programming by demonstration. In Springer handbook of robotics, pages 1371–1394. Springer, 2008.
- Grasping in the wild: Learning 6dof closed-loop grasping from low-cost demonstrations. IEEE Robotics and Automation Letters, 5(3):4978–4985, 2020.
- Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020.
- Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding, 163:21–40, 2017.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Towards an end-to-end framework for flow-guided video inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17562–17571, 2022.
- Visual imitation made easy, 2020.
- An empirical evaluation of the system usability scale. Intl. Journal of Human–Computer Interaction, 24(6):574–594, 2008.
- Jiafei Duan (26 papers)
- Yi Ru Wang (12 papers)
- Mohit Shridhar (14 papers)
- Dieter Fox (201 papers)
- Ranjay Krishna (116 papers)