Efficient Data Collection for Robotic Manipulation via Compositional Generalization (2403.05110v2)
Abstract: Data collection has become an increasingly important problem in robotic manipulation, yet there still lacks much understanding of how to effectively collect data to facilitate broad generalization. Recent works on large-scale robotic data collection typically vary many environmental factors of variation (e.g., object types, table textures) during data collection, to cover a diverse range of scenarios. However, they do not explicitly account for the possible compositional abilities of policies trained on the data. If robot policies can compose environmental factors from their data to succeed when encountering unseen factor combinations, we can exploit this to avoid collecting data for situations that composition would address. To investigate this possibility, we conduct thorough empirical studies both in simulation and on a real robot that compare data collection strategies and assess whether visual imitation learning policies can compose environmental factors. We find that policies do exhibit composition, although leveraging prior robotic datasets is critical for this on a real robot. We use these insights to propose better in-domain data collection strategies that exploit composition, which can induce better generalization than naive approaches for the same amount of effort during data collection. We further demonstrate that a real robot policy trained on data from such a strategy achieves a success rate of 77.5% when transferred to entirely new environments that encompass unseen combinations of environmental factors, whereas policies trained using data collected without accounting for environmental variation fail to transfer effectively, with a success rate of only 2.5%. We provide videos at http://iliad.stanford.edu/robot-data-comp/.
- C-vqa: A compositional split of the visual question answering (vqa) v1. 0 dataset. arXiv preprint arXiv:1704.08243, 2017.
- Autort: Embodied foundation models for large scale orchestration of robotic agents, 2024.
- Data quality in imitation learning. conference on neural information processing systems. In Conference on Neural Information Processing Systems (NeurIPS), 2023.
- Covr: A test-bed for visually grounded compositional generalization with real images. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9824–9846, 2021.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. In arXiv preprint arXiv:2307.15818, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- What makes pre-trained visual representations successful for robust manipulation? arXiv preprint arXiv:2312.12444, 2023.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Robonet: Large-scale multi-robot learning. In Conference on Robot Learning, pages 885–897. PMLR, 2020.
- An unbiased look at datasets for visuo-motor pre-training. In Conference on Robot Learning, pages 1183–1198. PMLR, 2023.
- Learning modular neural network policies for multi-task and multi-robot transfer. In 2017 IEEE international conference on robotics and automation (ICRA), pages 2169–2176. IEEE, 2017.
- Compositional semantic parsing with large language models. In The Eleventh International Conference on Learning Representations, 2023.
- Compositional visual generation with energy based models. Advances in Neural Information Processing Systems, 33:6637–6647, 2020.
- Self-supervised visual planning with temporal skip connections. CoRL, 12:16, 2017.
- Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets. In Proceedings of Robotics: Science and Systems, New York City, NY, USA, June 2022. doi: 10.15607/RSS.2022.XVIII.063.
- Rh20t: A robotic dataset for learning diverse skills in one-shot. In RSS 2023 Workshop on Learning for Task and Motion Planning, 2023.
- Do as i can, not as i say: Grounding language in robotic affordances. In 6th Annual Conference on Robot Learning, 2022.
- Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2901–2910, 2017.
- Language-driven representation learning for robotics. In Robotics: Science and Systems (RSS), 2023.
- Hg-dagger: Interactive imitation learning with human experts. In 2019 International Conference on Robotics and Automation (ICRA), pages 8077–8083. IEEE, 2019.
- Measuring compositional generalization: A comprehensive method on realistic data. In International Conference on Learning Representations, 2019.
- Cogs: A compositional generalization challenge based on semantic interpretation. In 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, pages 9087–9105. Association for Computational Linguistics (ACL), 2020.
- Deep compositional robotic planners that follow natural language commands. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 4906–4912. IEEE, 2020.
- Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International conference on machine learning, pages 2873–2882. PMLR, 2018.
- Dart: Noise injection for robust imitation learning. In Conference on robot learning, pages 143–156. PMLR, 2017.
- Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International journal of robotics research, 37(4-5):421–436, 2018.
- Compositional visual generation with composable diffusion models. In European Conference on Computer Vision, pages 423–439. Springer, 2022.
- Learning latent plans from play. In Conference on robot learning, pages 1113–1132. PMLR, 2020.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. In The Eleventh International Conference on Learning Representations, 2022.
- Where are we in the search for an artificial visual cortex for embodied intelligence? In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Roboturk: A crowdsourcing platform for robotic skill learning through imitation. In Conference on Robot Learning, pages 879–893. PMLR, 2018.
- R3m: A universal visual representation for robot manipulation. In Conference on Robot Learning, pages 892–909. PMLR, 2022.
- Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023.
- Open x-embodiment: Robotic learning datasets and rt-x models. arXiv preprint arXiv:2310.08864, 2023.
- The unsurprising effectiveness of pre-trained vision models for control. In International Conference on Machine Learning, pages 17359–17371. PMLR, 2022.
- Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In 2016 IEEE international conference on robotics and automation (ICRA), pages 3406–3413. IEEE, 2016.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Real-world robot learning with masked visual pre-training. In Conference on Robot Learning, pages 416–426. PMLR, 2022.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
- Visual representation learning does not generalize strongly within the same domain. In International Conference on Learning Representations, 2021.
- Multiple interactions made easy (mime): Large scale demonstrations data for imitation. In Conference on robot learning, pages 906–915. PMLR, 2018.
- Open-world object manipulation using pre-trained vision-language models. In 7th Annual Conference on Robot Learning, 2023.
- Bridgedata v2: A dataset for robot learning at scale. In Conference on Robot Learning, pages 1723–1736. PMLR, 2023.
- Programmatically grounded, compositionally generalizable robotic manipulation. In The Eleventh International Conference on Learning Representations, 2023.
- Decomposing the generalization gap in imitation learning for visual robotic manipulation. arXiv preprint arXiv:2307.03659, 2023.
- Kitchenshift: Evaluating zero-shot generalization of imitation-based policy learning under domain shifts. In NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021.
- Neural task programming: Learning to generalize across hierarchical tasks. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3795–3802. IEEE, 2018.
- Compositional generalization in unsupervised compositional representation learning: A study on disentanglement and emergent language. Advances in Neural Information Processing Systems, 35:25074–25087, 2022.
- Policy architectures for compositional generalization in control. arXiv preprint arXiv:2203.05960, 2022.
- Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, 2023.
- Jensen Gao (9 papers)
- Annie Xie (21 papers)
- Ted Xiao (40 papers)
- Chelsea Finn (264 papers)
- Dorsa Sadigh (162 papers)