Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting (2402.19249v3)
Abstract: The ability to reuse collected data and transfer trained policies between robots could alleviate the burden of additional data collection and training. While existing approaches such as pretraining plus finetuning and co-training show promise, they do not generalize to robots unseen in training. Focusing on common robot arms with similar workspaces and 2-jaw grippers, we investigate the feasibility of zero-shot transfer. Through simulation studies on 8 manipulation tasks, we find that state-based Cartesian control policies can successfully zero-shot transfer to a target robot after accounting for forward dynamics. To address robot visual disparities for vision-based policies, we introduce Mirage, which uses "cross-painting"--masking out the unseen target robot and inpainting the seen source robot--during execution in real time so that it appears to the policy as if the trained source robot were performing the task. Mirage applies to both first-person and third-person camera views and policies that take in both states and images as inputs or only images as inputs. Despite its simplicity, our extensive simulation and physical experiments provide strong evidence that Mirage can successfully zero-shot transfer between different robot arms and grippers with only minimal performance degradation on a variety of manipulation tasks such as picking, stacking, and assembly, significantly outperforming a generalist policy. Project website: https://robot-mirage.github.io/
- Flamingo: a visual language model for few-shot learning, 2022.
- Unsupervised cross-domain transfer in policy gradient reinforcement learning via manifold alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
- Human-to-robot imitation in the wild. Robotics: Science and Systems (RSS), 2022.
- RoboAgent: Towards sample efficient robot manipulation with semantic augmentations and action chunking. arxiv, 2023.
- Zero-shot robotic manipulation with pretrained image-editing diffusion models. arXiv preprint arXiv:2310.10639, 2023.
- Alignment-based transfer learning for robot models. In The 2013 international joint conference on neural networks (IJCNN), pages 1–7. IEEE, 2013.
- Learning one-shot imitation from humans without humans. IEEE Robotics and Automation Letters, 5(2):3533–3539, 2020.
- Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023.
- G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
- RT-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023a.
- RT-1: Robotics transformer for real-world control at scale. Robotics: Science and Systems (RSS), 2023b.
- Learning generalizable robotic reward functions from “in-the-wild" human videos. arXiv preprint arXiv:2103.16817, 2021.
- Berkeley UR5 demonstration dataset. https://sites.google.com/view/berkeley-ur5/home.
- Hardware conditioned policies for multi-robot transfer learning. Advances in Neural Information Processing Systems, 31, 2018.
- Pali-x: On scaling up a multilingual vision and language model, 2023a.
- Genaug: Retargeting behaviors to unseen situations via generative augmentation. arXiv preprint arXiv:2302.06671, 2023b.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Rapid transfer of controllers between uavs using learning-based adaptive control. In 2013 IEEE International Conference on Robotics and Automation, pages 5409–5416. IEEE, 2013.
- Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.
- Robonet: Large-scale multi-robot learning. arXiv preprint arXiv:1910.11215, 2019.
- Jacquard: A large scale dataset for robotic grasp detection. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3511–3516. IEEE, 2018.
- Learning modular neural network policies for multi-task and multi-robot transfer. In 2017 IEEE international conference on robotics and automation (ICRA), pages 2169–2176. IEEE, 2017.
- PaLM-E: An embodied multimodal language model, 2023.
- Ar2-d2: Training a robot without a robot. arXiv preprint arXiv:2306.13818, 2023.
- Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. arXiv preprint arXiv:1812.00568, 2018.
- Bridge data: Boosting generalization of robotic skills with cross-domain datasets. In Robotics: Science and Systems (RSS) XVIII, 2022.
- ACRONYM: A large-scale grasp dataset based on simulation. In 2021 IEEE Int. Conf. on Robotics and Automation, ICRA, 2020.
- Policy transfer via kinematic domain randomization and adaptation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 45–51. IEEE, 2021.
- RH20T: A robotic dataset for learning diverse skills in one-shot. In RSS 2023 Workshop on Learning for Task and Motion Planning, 2023.
- Cross-domain imitation learning via optimal transport. arXiv preprint arXiv:2110.03684, 2021.
- Deep visual foresight for planning robot motion. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 2786–2793. IEEE, 2017.
- Learn what matters: cross-domain imitation learning with task-relevant embeddings. Advances in Neural Information Processing Systems, 35:26283–26294, 2022.
- A system for morphology-task generalization via unified representation and behavior distillation. arXiv preprint arXiv:2211.14296, 2022.
- A conformal mapping-based framework for robot-to-robot and sim-to-real transfer learning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1289–1295. IEEE, 2021.
- Bayesian meta-learning for few-shot policy adaptation across robotic platforms. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1274–1280. IEEE, 2021.
- Learning invariant feature spaces to transfer skills with reinforcement learning. arXiv preprint arXiv:1703.02949, 2017.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Hierarchically decoupled imitation for morphological transfer. In International Conference on Machine Learning, pages 4159–4171. PMLR, 2020.
- Multi-robot transfer learning: A dynamical system perspective. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4702–4708. IEEE, 2017.
- Exaug: Robot-conditioned navigation policies via geometric experience augmentation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4077–4084. IEEE, 2023.
- Know thyself: Transferable visual control policies through robot-awareness. arXiv preprint arXiv:2107.09047, 2021.
- Skill transfer in deep reinforcement learning under morphological heterogeneity. arXiv preprint arXiv:1908.05265, 2019.
- One policy to control them all: Shared modular policies for agent-agnostic control. In International Conference on Machine Learning, pages 4455–4464. PMLR, 2020.
- Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020.
- Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13739–13748, 2022.
- BC-Z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning (CoRL), pages 991–1002, 2021.
- Policy stitching: Learning transferable robot policies. arXiv preprint arXiv:2309.13753, 2023.
- VIMA: General robot manipulation with multimodal prompts. International Conference on Machine Learning (ICML), 2023.
- QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293, 2018.
- Oussama Khatib. A unified approach for motion and force control of robot manipulators: The operational space formulation. IEEE Journal on Robotics and Automation, 3(1):43–53, 1987.
- Alexander Khazatsky. Droid: A large-scale in-the-wild robot manipulation dataset. https://github.com/AlexanderKhazatsky/R2D2, 2023.
- Domain adaptive imitation learning. In International Conference on Machine Learning, pages 5286–5295. PMLR, 2020.
- N. Koenig and A. Howard. Design and use paradigms for gazebo, an open-source multi-robot simulator. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), volume 3, pages 2149–2154 vol.3, 2004. doi: 10.1109/IROS.2004.1389727.
- Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pages 489–496, 2006.
- My body is a cage: the role of morphology in graph-based incompatible control. arXiv preprint arXiv:2010.01856, 2020.
- Transfer learning across heterogeneous robots with action sequence mapping. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3251–3256. IEEE, 2010.
- Copy-and-paste networks for deep video inpainting. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4413–4421, 2019.
- Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International journal of robotics research, 37(4-5):421–436, 2018.
- Xingyu Liu. Meta-evolve: Continuous robot evolution for one-to-many policy transfer. 2022.
- Revolver: Continuous evolutionary models for robot-to-robot policy transfer. arXiv preprint arXiv:2202.05244, 2022.
- Imitation from observation: Learning to imitate behaviors from raw video via context translation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1118–1125. IEEE, 2018.
- Interactive language: Talking to robots in real time. IEEE Robotics and Automation Letters, 2023.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. arXiv preprint arXiv:2210.00030, 2022.
- Knowledge transfer for learning robot models via local procrustes analysis. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 1075–1082. IEEE, 2015.
- Accelerating model learning with inter-robot knowledge transfer. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 2417–2424. IEEE, 2018.
- Ashish Malik. Zero-shot generalization using cascaded system-representations. arXiv preprint arXiv:1912.05501, 2019.
- Cacti: A framework for scalable multi-task multi-scene visual imitation learning. arXiv preprint arXiv:2212.05711, 2022.
- What matters in learning from offline human demonstrations for robot manipulation. In arXiv preprint arXiv:2108.03298, 2021.
- Mimicgen: A data generation system for scalable robot learning using human demonstrations. In 7th Annual Conference on Robot Learning, 2023.
- Orbit: A unified simulation framework for interactive robot learning environments. IEEE Robotics and Automation Letters, 8(6):3740–3747, 2023. doi: 10.1109/LRA.2023.3270034.
- R3m: A universal visual representation for robot manipulation. In CoRL, 2022.
- Tool as embodiment for recursive manipulation. arXiv preprint arXiv:2112.00359, 2021.
- Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023.
- Learning to control self-assembling morphologies: a study of generalization via modularity. Advances in Neural Information Processing Systems, 32, 2019.
- Learning agile robotic locomotion skills by imitating animals. arXiv preprint arXiv:2004.00784, 2020.
- Real-world robot learning with masked visual pre-training. In Conference on Robot Learning, 2022.
- Robot learning with sensorimotor pre-training. In Conference on Robot Learning, 2023.
- A preliminary study of transfer learning between unicycle robots. In 2016 AAAI Spring Symposium Series, 2016.
- Cross-domain imitation from observations. In International Conference on Machine Learning, pages 8902–8912. PMLR, 2021.
- A generalist agent. Transactions on Machine Learning Research, 2022. ISSN 2835-8856.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022.
- Sim-to-real robot learning from pixels with progressive nets. In Conference on robot learning, pages 262–270. PMLR, 2017.
- Bridging action space mismatch in learning from demonstrations. arXiv preprint arXiv:2304.03833, 2023.
- Graph networks as learnable physics engines for inference and control. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4470–4479. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/sanchez-gonzalez18a.html.
- Reinforcement learning with videos: Combining offline observations with interaction. In Conference on Robot Learning, pages 339–354. PMLR, 2021.
- A generalist dynamics model for control, 2023.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- On bringing robots home, 2023.
- GNM: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7226–7233. IEEE, 2023a.
- ViNT: A Foundation Model for Visual Navigation. In 7th Annual Conference on Robot Learning (CoRL), 2023b.
- Translating robot skills: Learning unsupervised skill correspondences across robots. In International Conference on Machine Learning, pages 19626–19644. PMLR, 2022.
- Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters, 5(2):2286–2293, 2020.
- Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pages 894–906. PMLR, 2022a.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Proceedings of the 6th Conference on Robot Learning (CoRL), 2022b.
- Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube. arXiv preprint arXiv:2202.10448, 2022.
- Avid: Learning multi-stage tasks via pixel-level translation of human videos. arXiv preprint arXiv:1912.04443, 2019.
- Open-world object manipulation using pre-trained vision-language models. arXiv preprint arXiv:2303.00905, 2023.
- Transfer rl across observation feature spaces via model-based regularization. arXiv preprint arXiv:2201.00248, 2022.
- Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 2009.
- Alexandru Telea. An image inpainting technique based on the fast marching method. Journal of graphics tools, 9(1):23–34, 2004.
- Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
- Bridgedata v2: A dataset for robot learning at scale, 2023.
- Nervenet: Learning structured policy with graph neural networks. In International conference on learning representations, 2018.
- Any-point trajectory modeling for policy learning. arXiv preprint arXiv:2401.00025, 2023.
- Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
- Learning by watching: Physical imitation of manipulation skills from human videos. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7827–7834. IEEE, 2021.
- Universal morphology control via contextual modulation. arXiv preprint arXiv:2302.11070, 2023.
- XSkill: Cross embodiment skill discovery. arXiv preprint arXiv:2307.09955, 2023.
- Adagrasp: Learning an adaptive gripper-aware grasping policy. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4620–4626. IEEE, 2021.
- Polybot: Training one policy across robots while embracing variability. arXiv preprint arXiv:2307.03719, 2023.
- Cross domain robot imitation with invariant representation. In 2022 International Conference on Robotics and Automation (ICRA), pages 455–461. IEEE, 2022.
- Multi-embodiment legged robot control as a sequence modeling problem. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7250–7257. IEEE, 2023a.
- Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790, 2023b.
- One-shot imitation from observing humans via domain-adaptive meta-learning. Robotics: Science and Systems XIV, 2018.
- Scaling robot learning with semantically imagined experience. arXiv preprint arXiv:2302.11550, 2023c.
- Xirl: Cross-embodiment inverse reinforcement learning. In Conference on Robot Learning, pages 537–546. PMLR, 2022.
- Policy transfer across visual and dynamics domain gaps via iterative grounding. arXiv preprint arXiv:2107.00339, 2021.
- Adding conditional control to text-to-image diffusion models, 2023.
- Learning cross-domain correspondence for control with dynamics cycle-consistency. arXiv preprint arXiv:2012.09811, 2020.
- Modularity through attention: Efficient training and transfer of language-conditioned policies for robot manipulation. In Karen Liu, Dana Kulic, and Jeff Ichnowski, editors, Proceedings of The 6th Conference on Robot Learning, volume 205 of Proceedings of Machine Learning Research, pages 1684–1695. PMLR, 14–18 Dec 2023. URL https://proceedings.mlr.press/v205/zhou23b.html.
- Manipulator-independent representations for visual imitation. arXiv preprint arXiv:2103.09016, 2021.
- robosuite: A modular simulation framework and benchmark for robot learning. In arXiv preprint arXiv:2009.12293, 2020.
- Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.