Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective (2404.12281v3)

Published 18 Apr 2024 in cs.RO

Abstract: Precise robot manipulations require rich spatial information in imitation learning. Image-based policies model object positions from fixed cameras, which are sensitive to camera view changes. Policies utilizing 3D point clouds usually predict keyframes rather than continuous actions, posing difficulty in dynamic and contact-rich scenarios. To utilize 3D perception efficiently, we present RISE, an end-to-end baseline for real-world imitation learning, which predicts continuous actions directly from single-view point clouds. It compresses the point cloud to tokens with a sparse 3D encoder. After adding sparse positional encoding, the tokens are featurized using a transformer. Finally, the features are decoded into robot actions by a diffusion head. Trained with 50 demonstrations for each real-world task, RISE surpasses currently representative 2D and 3D policies by a large margin, showcasing significant advantages in both accuracy and efficiency. Experiments also demonstrate that RISE is more general and robust to environmental change compared with previous baselines. Project website: rise-policy.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. “RT-1: Robotics Transformer for Real-World Control at Scale” In Robotics: Science and Systems, 2023
  2. “PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation” In Conference on Robot Learning, 2023, pp. 1761–1781 PMLR
  3. “Multi-View 3D Object Detection Network for Autonomous Driving” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1907–1915
  4. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion” In Robotics: Science and Systems, 2023
  5. Christopher Choy, JunYoung Gwak and Silvio Savarese “4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3075–3084
  6. “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5828–5839
  7. “Reinforcement Learning with Neural Radiance Fields” In Advances in Neural Information Processing Systems 35, 2022, pp. 16931–16945
  8. “AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains” In IEEE Transactions on Robotics IEEE, 2023
  9. “RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot” In RSS 2023 Workshop on Learning for Task and Motion Planning, 2023
  10. “Low-cost exoskeletons for learning whole-arm manipulation in the wild” In IEEE International Conference on Robotics and Automation, 2024
  11. “Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation” In Conference on Robot Learning, 2023, pp. 3949–3965 PMLR
  12. “RVT: Robotic View Transformer for 3D Object Manipulation” In Conference on Robot Learning, 2023, pp. 694–710 PMLR
  13. Benjamin Graham, Martin Engelcke and Laurens Van Der Maaten “3D Semantic Segmentation with Submanifold Sparse Convolutional Networks” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9224–9232
  14. “Instruction-Driven History-Aware Policies for Robotic Manipulations” In Conference on Robot Learning, 2022, pp. 175–187 PMLR
  15. Huy Ha, Pete Florence and Shuran Song “Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition” In Conference on Robot Learning, 2023, pp. 3766–3777 PMLR
  16. Abdullah Hamdi, Silvio Giancola and Bernard Ghanem “MVTN: Multi-View Transformation Network for 3D Shape Recognition” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1–11
  17. “Deep Residual Learning for Image Recognition” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778
  18. Jonathan Ho, Ajay Jain and Pieter Abbeel “Denoising Diffusion Probabilistic Models” In Advances in Neural Information Processing Systems 33, 2020, pp. 6840–6851
  19. “Coarse-to-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13739–13748
  20. “RLBench: The Robot Learning Benchmark & Learning Environment” In IEEE Robotics and Automation Letters 5.2 IEEE, 2020, pp. 3019–3026
  21. “BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning” In Conference on Robot Learning PMLR, 2021, pp. 991–1002
  22. “Planning with Diffusion for Flexible Behavior Synthesis” In International Conference on Machine Learning, 2022, pp. 9902–9915 PMLR
  23. Bo Li, Tianlei Zhang and Tian Xia “Vehicle Detection from 3D Lidar Using Fully Convolutional Network” In Robotics: Science and Systems, 2016
  24. “LIV: Language-Image Representations and Rewards for Robotic Control” In International Conference on Machine Learning, 2023, pp. 23301–23320 PMLR
  25. “VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training” In International Conference on Learning Representations, 2023
  26. “Where are We in the Search for an Artificial Visual Cortex for Embodied Intelligence?” In ICRA 2023 Workshop on Pretraining for Robotics, 2023
  27. “RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation” In Conference on Robot Learning, 2018, pp. 879–893 PMLR
  28. “What Matters in Learning from Offline Human Demonstrations for Robot Manipulation” In Conference on Robot Learning, 2021, pp. 1678–1690 PMLR
  29. “CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks” In IEEE Robotics and Automation Letters 7.3 IEEE, 2022, pp. 7327–7334
  30. “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis” In Communications of the ACM 65.1 ACM New York, NY, USA, 2021, pp. 99–106
  31. “R3M: A Universal Visual Representation for Robot Manipulation” In Conference on Robot Learning, 2022, pp. 892–909 PMLR
  32. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models” In arXiv preprint arXiv:2310.08864, 2023
  33. “3D Object Detection with PointFormer” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7463–7472
  34. “The Surprising Effectiveness of Representation Learning for Visual Imitation” In Robotics: Science and Systems, 2022
  35. Dean A Pomerleau “ALVINN: An Autonomous Land Vehicle in a Neural Network” In Advances in Neural Information Processing Systems 1, 1988
  36. “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 652–660
  37. “PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies” In Advances in Neural Information Processing Systems 35, 2022, pp. 23192–23204
  38. “Real-World Robot Learning with Masked Visual Pre-Training” In Conference on Robot Learning, 2022, pp. 416–426 PMLR
  39. “Vision-Based Multi-Task Manipulation for Inexpensive Robots using End-to-End Learning from Demonstration” In IEEE International Conference on Robotics and Automation, 2018, pp. 3758–3765 IEEE
  40. “A Generalist Agent” In Transactions on Machine Learning Research, 2022
  41. Gernot Riegler, Ali Osman Ulusoy and Andreas Geiger “OctNet: Learning Deep 3D Representations at High Resolutions” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3577–3586
  42. “Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation” In Conference on Robot Learning, 2023, pp. 405–424 PMLR
  43. Mohit Shridhar, Lucas Manuelli and Dieter Fox “Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation” In Conference on Robot Learning, 2022, pp. 785–799 PMLR
  44. Kihyuk Sohn, Honglak Lee and Xinchen Yan “Learning Structured Output Representation using Deep Conditional Generative Models” In Advances in Neural Information Processing Systems 28, 2015
  45. Jiaming Song, Chenlin Meng and Stefano Ermon “Denoising Diffusion Implicit Models” In The International Conference on Learning Representations, 2021
  46. “Contact-Graspnet: Efficient 6-DoF Grasp Generation in Cluttered Scenes” In IEEE International Conference on Robotics and Automation, 2021, pp. 13438–13444 IEEE
  47. “Octo: An Open-Source Generalist Robot Policy”, 2023
  48. “Attention is All You Need” In Advances in Neural Information Processing Systems 30, 2017
  49. “O-CNN: Octree-Based Convolutional Neural Networks for 3D Shape Analysis” In ACM Transactions On Graphics 36.4 ACM New York, NY, USA, 2017, pp. 1–11
  50. “3D ShapeNets: A Deep Representation for Volumetric Shapes” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1912–1920
  51. “ChainedDiffuser: Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation” In Conference on Robot Learning, 2023, pp. 2323–2339 PMLR
  52. Jianglong Ye, Naiyan Wang and Xiaolong Wang “FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8962–8973
  53. “3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations” In arXiv preprint arXiv:2403.03954, 2024
  54. “GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields” In Conference on Robot Learning, 2023, pp. 284–301 PMLR
  55. “Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation” In IEEE International Conference on Robotics and Automation, 2018, pp. 5628–5635 IEEE
  56. “Point Transformer” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16259–16268
  57. “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware” In Robotics: Science and Systems, 2023
  58. “On the Continuity of Rotation Representations in Neural Networks” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5745–5753
  59. “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4490–4499
  60. “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control” In Conference on Robot Learning, 2023, pp. 2165–2183
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Chenxi Wang (66 papers)
  2. Hongjie Fang (17 papers)
  3. Hao-Shu Fang (38 papers)
  4. Cewu Lu (203 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com