3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations (2403.03954v7)
Abstract: Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .
- Dexterous functional grasping. In CoRL, 2023.
- Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
- Dexterous imitation made easy: A learning-based framework for efficient dexterous manipulation. In ICRA, 2023.
- Layer normalization. arXiv, 2016.
- Dexart: Benchmarking generalizable dexterous manipulation with articulated objects. In CVPR, 2023.
- A system for general in-hand object re-orientation. In CoRL, 2022a.
- Visual dexterity: In-hand reorientation of novel and complex object shapes. Science Robotics, 8(84):eadc9244, 2023a. doi: 10.1126/scirobotics.adc9244.
- Towards human-level bimanual dexterous manipulation with reinforcement learning. NeurIPS, 2022b.
- Sequential dexterity: Chaining dexterous policies for long-horizon manipulation. CoRL, 2023b.
- Diffusion policy: Visuomotor policy learning via action diffusion. RSS, 2023.
- Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots. arXiv preprint arXiv:2402.10329, 2024.
- Implicit behavioral cloning. In CoRL, 2022.
- Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation. In arXiv, 2024.
- Act3d: Infinite resolution action detection transformer for robotic manipulation. arXiv preprint arXiv:2306.17817, 2023.
- Rvt: Robotic view transformer for 3d object manipulation. arXiv, 2023.
- Scaling up and distilling down: Language-guided robot skill acquisition. In Conference on Robot Learning. PMLR, 2023.
- Teach a robot to fish: Versatile imitation from one minute of demonstrations. RSS, 2023.
- Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.
- Dextreme: Transfer of agile in-hand manipulation from simulation to reality. In ICRA, 2023.
- Stabilizing deep q-learning with convnets and vision transformers under data augmentation. Advances in neural information processing systems, 2021.
- On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. In International Conference on Machine Learning (ICML), 2022.
- Modem: Accelerating visual model-based reinforcement learning with demonstrations. In ICLR, 2023a.
- Td-mpc2: Scalable, robust world models for continuous control. arXiv, 2023b.
- Denoising diffusion probabilistic models. NeurIPS, 2020.
- Dynamic handover: Throw and catch with bimanual hands. CoRL, 2023a.
- Diffusion reward: Learning rewards via conditional video diffusion. arXiv, 2023b.
- Plasticinelab: A soft-body manipulation benchmark with differentiable physics. arXiv, 2021.
- Planning with diffusion for flexible behavior synthesis. arXiv, 2022.
- Seizing serendipity: Exploiting the value of past success in off-policy actor-critic. arXiv, 2023.
- 3d diffuser actor: Policy diffusion with 3d scene representations. Arxiv, 2024.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 2023.
- Uni-o4: Unifying online and offline deep reinforcement learning with multi-step on-policy optimization. arXiv, 2023.
- Dexdeform: Dexterous deformable object manipulation with human demonstrations and differentiable physics. arXiv, 2023.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv, 2022.
- Eureka: Human-level reward design via coding large language models. arXiv, 2023.
- Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv, 2021.
- What matters in learning from offline human demonstrations for robot manipulation. arXiv, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021.
- Extracting reward functions from diffusion models. arXiv preprint arXiv:2306.01804, 2023.
- The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021.
- Imitating human behaviour with diffusion models. ICLR, 2023.
- Learning agile robotic locomotion skills by imitating animals. arXiv, 2020.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017a.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. NeurIPS, 2017b.
- In-hand object rotation via rapid motor adaptation. In CoRL, 2023a.
- General in-hand object rotation with vision and touch. In CoRL, 2023b.
- Pointnext: Revisiting pointnet++ with improved training and scaling strategies. NeurIPS, 2022.
- Dexmv: Imitation learning for dexterous manipulation from human videos. In ECCV, 2022.
- Anyteleop: A general vision-based dexterous robot arm-hand teleoperation system. arXiv preprint arXiv:2307.04577, 2023.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv, 2017.
- Goal-conditioned imitation learning using score-based diffusion policies. arXiv preprint arXiv:2304.02532, 2023.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022.
- Edmp: Ensemble-of-costs-guided diffusion for motion planning. arXiv, 2023.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Deep imitation learning for humanoid loco-manipulation through human teleoperation. Humanoids, 2023a.
- Masked world models for visual control. In CoRL, 2023b.
- Behavior transformers: Cloning kškitalic_k modes with one stone. Advances in neural information processing systems, 2022.
- On bringing robots home. arXiv, 2023.
- Distilled feature fields enable few-shot language-guided manipulation. arXiv preprint arXiv:2308.07931, 2023.
- Robocook: Long-horizon elasto-plastic object manipulation with diverse tools. Proceedings of the 7th Conference on Robot Learning (CoRL), 2023.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In CoRL, 2023.
- Shelving, stacking, hanging: Relational pose diffusion for multi-modal rearrangement. arXiv preprint arXiv:2307.04751, 2023.
- Denoising diffusion implicit models. ICLR, 2021a.
- Score-based generative modeling through stochastic differential equations. ICLR, 2021b.
- Mujoco: A physics engine for model-based control. In IROS, 2012.
- Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023.
- Vrl3: A data-driven framework for visual deep reinforcement learning. Advances in Neural Information Processing Systems, 2022.
- Mimicplay: Long-horizon imitation learning by watching human play. CoRL, 2023a.
- Diffusion policies as an expressive policy class for offline reinforcement learning. ICLR, 2023b.
- Learning score-based grasping primitive for human-assisting dexterous grasping. In NeurIPS, 2023.
- Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation. In CoRL, 2023.
- Sapien: A simulated part-based interactive environment. In CVPR, 2020.
- NeRFuser: Diffusion guided multi-task 3d policy learning, 2024. URL https://openreview.net/forum?id=8GmPLkO0oR.
- Movie: Visual model-based policy adaptation for view generalization. Annual Conference on Neural Information Processing Systems (NeurIPS), 2023.
- Rotating without seeing: Towards in-hand dexterity through touch. RSS, 2023.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In CoRL, 2020.
- Robot synesthesia: In-hand manipulation with visuotactile sensing. arXiv, 2023.
- Pre-trained image encoder for generalizable visual reinforcement learning. Advances in Neural Information Processing Systems, 2022.
- Visual reinforcement learning with self-supervised 3d representations. IEEE Robotics and Automation Letters, 2023a.
- H-index: Visual reinforcement learning with hand-informed representations for dexterous manipulation. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2023b.
- Gnfactor: Multi-task real robot learning with generalizable neural feature fields. Proceedings of the 7th Conference on Robot Learning (CoRL), 2023c.
- Flexible handover with real-time robust dynamic grasp trajectory generation. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023.
- Point transformer. In ICCV, 2021.
- robosuite: A modular simulation framework and benchmark for robot learning. arXiv preprint arXiv:2009.12293, 2020.
- Yanjie Ze (20 papers)
- Gu Zhang (33 papers)
- Kangning Zhang (7 papers)
- Chenyuan Hu (1 paper)
- Muhan Wang (6 papers)
- Huazhe Xu (93 papers)