EquivAct: SIM(3)-Equivariant Visuomotor Policies beyond Rigid Object Manipulation (2310.16050v2)
Abstract: If a robot masters folding a kitchen towel, we would expect it to master folding a large beach towel. However, existing policy learning methods that rely on data augmentation still don't guarantee such generalization. Our insight is to add equivariance to both the visual object representation and policy architecture. We propose EquivAct which utilizes SIM(3)-equivariant network structures that guarantee generalization across all possible object translations, 3D rotations, and scales by construction. EquivAct is trained in two phases. We first pre-train a SIM(3)-equivariant visual representation on simulated scene point clouds. Then, we learn a SIM(3)-equivariant visuomotor policy using a small amount of source task demonstrations. We show that the learned policy directly transfers to objects that substantially differ from demonstrations in scale, position, and orientation. We evaluate our method in three manipulation tasks involving deformable and articulated objects, going beyond typical rigid object manipulation tasks considered in prior work. We conduct experiments both in simulation and in reality. For real robot experiments, our method uses 20 human demonstrations of a tabletop task and transfers zero-shot to a mobile manipulation task in a much larger setup. Experiments confirm that our contrastive pre-training procedure and equivariant architecture offer significant improvements over prior work. Project website: https://equivact.github.io
- S. James, P. Wohlhart, M. Kalakrishnan, D. Kalashnikov, A. Irpan, J. Ibarz, S. Levine, R. Hadsell, and K. Bousmalis, “Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 627–12 637.
- B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull, “Active domain randomization,” in Conference on Robot Learning. PMLR, 2020, pp. 1162–1176.
- M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” Advances in neural information processing systems, vol. 33, pp. 19 884–19 895, 2020.
- N. Hansen and X. Wang, “Generalization in reinforcement learning by soft data augmentation,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 611–13 617.
- T. Yu, T. Xiao, A. Stone, J. Tompson, A. Brohan, S. Wang, J. Singh, C. Tan, J. Peralta, B. Ichter, et al., “Scaling robot learning with semantically imagined experience,” arXiv preprint arXiv:2302.11550, 2023.
- J. Lei, C. Deng, K. Schmeckpeper, L. Guibas, and K. Daniilidis, “Efem: Equivariant neural field expectation maximization for 3d object segmentation without scene supervision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4902–4912.
- A. Simeonov, Y. Du, A. Tagliasacchi, J. B. Tenenbaum, A. Rodriguez, P. Agrawal, and V. Sitzmann, “Neural descriptor fields: Se (3)-equivariant object representations for manipulation,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 6394–6400.
- A. Simeonov, Y. Du, Y.-C. Lin, A. R. Garcia, L. P. Kaelbling, T. Lozano-Pérez, and P. Agrawal, “Se (3)-equivariant relational rearrangement with neural descriptor fields,” in Conference on Robot Learning. PMLR, 2023, pp. 835–846.
- Z. Xue, Z. Yuan, J. Wang, X. Wang, Y. Gao, and H. Xu, “Useek: Unsupervised se (3)-equivariant 3d keypoints for generalizable manipulation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1715–1722.
- H. Ryu, H.-i. Lee, J.-H. Lee, and J. Choi, “Equivariant descriptor fields: Se (3)-equivariant energy-based models for end-to-end visual robotic manipulation learning,” arXiv preprint arXiv:2206.08321, 2022.
- T. Weng, D. Held, F. Meier, and M. Mukadam, “Neural grasp distance fields for robot manipulation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1814–1821.
- N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley, “Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds,” arXiv preprint arXiv:1802.08219, 2018.
- F. Fuchs, D. Worrall, V. Fischer, and M. Welling, “Se (3)-transformers: 3d roto-translation equivariant attention networks,” Advances in neural information processing systems, vol. 33, pp. 1970–1981, 2020.
- H. Chen, S. Liu, W. Chen, H. Li, and R. Hill, “Equivariant point network for 3d point cloud analysis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 14 514–14 523.
- C. Deng, O. Litany, Y. Duan, A. Poulenard, A. Tagliasacchi, and L. J. Guibas, “Vector neurons: A general framework for so (3)-equivariant networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12 200–12 209.
- S. Assaad, C. Downey, R. Al-Rfou, N. Nayakanti, and B. Sapp, “Vn-transformer: Rotation-equivariant attention for vector neurons,” arXiv preprint arXiv:2206.04176, 2022.
- O. Katzir, D. Lischinski, and D. Cohen-Or, “Shape-pose disentanglement using se (3)-equivariant vector neurons,” in European Conference on Computer Vision. Springer, 2022, pp. 468–484.
- J. Li, S. Luo, C. Deng, C. Cheng, J. Guan, L. Guibas, J. Ma, and J. Peng, “Orientation-aware graph neural networks for protein structure representation learning,” 2022.
- A. Poulenard and L. J. Guibas, “A functional approach to rotation equivariant non-linearities for tensor field networks.” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 174–13 183.
- C. Deng, J. Lei, B. Shen, K. Daniilidis, and L. Guibas, “Banana: Banach fixed-point network for pointcloud segmentation with inter-part equivariance,” arXiv preprint arXiv:2305.16314, 2023.
- H.-X. Yu, J. Wu, and L. Yi, “Rotationally equivariant 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1456–1464.
- X. Liu, J. Zhang, R. Hu, H. Huang, H. Wang, and L. Yi, “Self-supervised category-level articulated object pose estimation with part-level se (3) equivariance,” arXiv preprint arXiv:2302.14268, 2023.
- R. Kondor and S. Trivedi, “On the generalization of equivariance and convolution in neural networks to the action of compact groups,” in International Conference on Machine Learning. PMLR, 2018, pp. 2747–2755.
- T. S. Cohen, M. Geiger, and M. Weiler, “A general theory of equivariant cnns on homogeneous spaces,” Advances in neural information processing systems, vol. 32, 2019.
- M. Weiler, P. Forré, E. Verlinde, and M. Welling, “Coordinate independent convolutional networks–isometry and gauge equivariant convolutions on riemannian manifolds,” arXiv preprint arXiv:2106.06020, 2021.
- J. Aronsson, “Homogeneous vector bundles and g-equivariant convolutional neural networks,” Sampling Theory, Signal Processing, and Data Analysis, vol. 20, no. 2, p. 10, 2022.
- Y. Xu, J. Lei, E. Dobriban, and K. Daniilidis, “Unified fourier-based kernel and nonlinearity design for equivariant networks on homogeneous spaces,” in International Conference on Machine Learning. PMLR, 2022, pp. 24 596–24 614.
- H. Ha and S. Song, “Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding,” in Conference on Robot Learning. PMLR, 2022, pp. 24–33.
- C. Chi and S. Song, “Garmentnets: Category-level pose estimation for garments via canonical space shape completion,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3324–3333.
- T. Weng, S. M. Bajracharya, Y. Wang, K. Agrawal, and D. Held, “Fabricflownet: Bimanual cloth manipulation with a flow-based policy,” in Conference on Robot Learning. PMLR, 2022, pp. 192–202.
- B. Shen, Z. Jiang, C. Choy, L. J. Guibas, S. Savarese, A. Anandkumar, and Y. Zhu, “Acid: Action-conditional implicit visual dynamics for deformable object manipulation,” arXiv preprint arXiv:2203.06856, 2022.
- S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, and O. Litany, “Pointcontrast: Unsupervised pre-training for 3d point cloud understanding,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 2020, pp. 574–591.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
- W. Sun, A. Tagliasacchi, B. Deng, S. Sabour, S. Yazdani, G. E. Hinton, and K. M. Yi, “Canonical capsules: Self-supervised capsules in canonical pose,” Advances in Neural information processing systems, vol. 34, pp. 24 993–25 005, 2021.
- E. Coumans and Y. Bai, “PyBullet, a Python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2019.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- R. Holmberg and O. Khatib, “Development and control of a holonomic mobile robot for mobile manipulation tasks,” The International Journal of Robotics Research, vol. 19, no. 11, pp. 1066–1074, 2000.
- Jingyun Yang (21 papers)
- Congyue Deng (23 papers)
- Jimmy Wu (21 papers)
- Rika Antonova (26 papers)
- Leonidas Guibas (177 papers)
- Jeannette Bohg (109 papers)