RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation (2403.19460v2)
Abstract: We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target poses of objects for manipulation without any object segmentation. RiEMann learns a manipulation task from scratch with 5 to 10 demonstrations, generalizes to unseen SE(3) transformations and instances of target objects, resists visual interference of distracting objects, and follows the near real-time pose change of the target object. The scalable action space of RiEMann facilitates the addition of custom equivariant actions such as the direction of turning the faucet, which makes articulated object manipulation possible for RiEMann. In simulation and real-world 6-DOF robot manipulation experiments, we test RiEMann on 5 categories of manipulation tasks with a total of 25 variants and show that RiEMann outperforms baselines in both task success rates and SE(3) geodesic distance errors on predicted poses (reduced by 68.6%), and achieves a 5.4 frames per second (FPS) network inference speed. Code and video results are available at https://riemann-web.github.io/.
- Cormorant: Covariant molecular neural networks. Advances in neural information processing systems, 32, 2019.
- A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
- Erik J. Bekkers. An introduction to equivariant convolutional neural networks for continuous groups. https://uvagedl.github.io/GroupConvLectureNotes.pdf, 2021.
- Lorentz group equivariant neural network for particle physics. In International Conference on Machine Learning, pages 992–1002. PMLR, 2020.
- Luca Carlone. Lecture 4: Lie groups. Lecture Nots of Visual Navigation for Autonomous Vehicles (VNAV), 2023.
- Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
- Local neural descriptor fields: Locally conditioned object representations for manipulation. arXiv preprint arXiv:2302.03573, 2023.
- Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999. PMLR, 2016a.
- Steerable cnns. arXiv preprint arXiv:1612.08498, 2016b.
- A general theory of equivariant cnns on homogeneous spaces. Advances in neural information processing systems, 32, 2019.
- Vector neurons: A general framework for so (3)-equivariant networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12200–12209, 2021.
- Carlos Esteves. Theoretical aspects of group equivariant neural networks. arXiv preprint arXiv:2004.05154, 2020.
- Polar transformer networks. arXiv preprint arXiv:1709.01889, 2017.
- Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In International Conference on Machine Learning, pages 3165–3176. PMLR, 2020a.
- Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In International Conference on Machine Learning, pages 3165–3176. PMLR, 2020b.
- A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International conference on machine learning, pages 3318–3328. PMLR, 2021.
- Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756, 2018.
- Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in neural information processing systems, 33:1970–1981, 2020.
- Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems, 34:6790–6802, 2021.
- Maniskill2: A unified benchmark for generalizable manipulation skills. arXiv preprint arXiv:2302.04659, 2023.
- Geometrically equivariant graph neural networks: A survey. arXiv preprint arXiv:2202.07230, 2022a.
- Geometrically equivariant graph neural networks: A survey. arXiv preprint arXiv:2202.07230, 2022b.
- haosulab. Mplib: a lightweight python package for motion planning, 2023. URL https://github.com/haosulab/MPlib. GitHub repository.
- Walter Hoffmann. Iterative algorithmen für die gram-schmidt-orthogonalisierung. Computing, 41:335–348, 1989.
- Equivariant transporter network. arXiv preprint arXiv:2202.09400, 2022a.
- Edge grasp network: A graph-based se (3)-invariant approach to grasp detection. arXiv preprint arXiv:2211.00191, 2022b.
- Leveraging symmetries in pick and place. The International Journal of Robotics Research, page 02783649231225775, 2024.
- Lietransformer: Equivariant self-attention for lie groups. In International Conference on Machine Learning, pages 4533–4543. PMLR, 2021.
- Semantic labeling of 3d point clouds with object affordance for robot manipulation. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 5578–5584. IEEE, 2014.
- Se (2)-equivariant pushing dynamics models for tabletop object manipulations. In Conference on Robot Learning, pages 427–436. PMLR, 2023.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR, 2020a.
- Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895, 2020b.
- Efem: Equivariant neural field expectation maximization for 3d object segmentation without scene supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4902–4912, 2023.
- Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990, 2022.
- Libero: Benchmarking knowledge transfer for lifelong robot learning. arXiv preprint arXiv:2306.03310, 2023.
- What matters in learning from offline human demonstrations for robot manipulation. arXiv preprint arXiv:2108.03298, 2021.
- kpam: Keypoint affordances for category-level robotic manipulation. In The International Symposium of Robotics Research, pages 132–157. Springer, 2019.
- Rotation equivariant vector field networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 5048–5057, 2017.
- Alexandre Milesi. Se(3)-transformers for pytorch, 2021. URL https://github.com/NVIDIA/DeepLearningExamples/tree/master/DGLPyTorch/DrugDiscovery/SE3Transformer.
- Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 3:297–330, 2020.
- Equivariant descriptor fields: Se (3)-equivariant energy-based models for end-to-end visual robotic manipulation learning. arXiv preprint arXiv:2206.08321, 2022.
- Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation. arXiv preprint arXiv:2309.02685, 2023.
- Learning to rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4568–4575. IEEE, 2021.
- Time-contrastive networks: Self-supervised learning from video. In 2018 IEEE international conference on robotics and automation (ICRA), pages 1134–1141. IEEE, 2018.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning, pages 785–799. PMLR, 2023.
- Neural descriptor fields: Se (3)-equivariant object representations for manipulation. In 2022 International Conference on Robotics and Automation (ICRA), pages 6394–6400. IEEE, 2022.
- Se (3)-equivariant relational rearrangement with neural descriptor fields. In Conference on Robot Learning, pages 835–846. PMLR, 2023.
- Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- On-robot learning with equivariant models. arXiv preprint arXiv:2203.04923, 2022a.
- So (2)-equivariant reinforcement learning. In International Conference on Learning Representations, 2022b.
- 3d steerable cnns: Learning rotationally equivariant features in volumetric data. Advances in Neural Information Processing Systems, 31, 2018.
- SAPIEN: A simulated part-based interactive environment. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Useek: Unsupervised se (3)-equivariant 3d keypoints for generalizable manipulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1715–1722. IEEE, 2023.
- Equivact: Sim (3)-equivariant visuomotor policies beyond rigid object manipulation. arXiv preprint arXiv:2310.16050, 2023.
- Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, pages 726–747. PMLR, 2021.
- Sample efficient grasp learning using equivariant models. arXiv preprint arXiv:2202.09468, 2022.