Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation (2403.03890v1)
Abstract: This paper introduces Hierarchical Diffusion Policy (HDP), a hierarchical agent for multi-task robotic manipulation. HDP factorises a manipulation policy into a hierarchical structure: a high-level task-planning agent which predicts a distant next-best end-effector pose (NBP), and a low-level goal-conditioned diffusion policy which generates optimal motion trajectories. The factorised policy representation allows HDP to tackle both long-horizon task planning while generating fine-grained low-level actions. To generate context-aware motion trajectories while satisfying robot kinematics constraints, we present a novel kinematics-aware goal-conditioned control agent, Robot Kinematics Diffuser (RK-Diffuser). Specifically, RK-Diffuser learns to generate both the end-effector pose and joint position trajectories, and distill the accurate but kinematics-unaware end-effector pose diffuser to the kinematics-aware but less accurate joint position diffuser via differentiable kinematics. Empirically, we show that HDP achieves a significantly higher success rate than the state-of-the-art methods in both simulation and real-world.
- Is conditional generative modeling all you need for decision making? In The Eleventh International Conference on Learning Representations, 2023.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Imitation learning as state matching via differentiable physics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7846–7855, 2023a.
- Daxbench: Benchmarking deformable object manipulation with differentiable physics. In The Eleventh International Conference on Learning Representations, 2023b.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Act3d: Infinite resolution action detection transformer for robotic manipulation. arXiv preprint arXiv:2306.17817, 2023.
- Rvt: Robotic view transformer for 3d object manipulation. arXiv preprint arXiv:2306.14896, 2023.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Difftaichi: Differentiable programming for physical simulation. arXiv preprint arXiv:1910.00935, 2019.
- Plasticinelab: A soft-body manipulation benchmark with differentiable physics. arXiv preprint arXiv:2104.03311, 2021.
- Perceiver io: A general architecture for structured inputs & outputs. arXiv preprint arXiv:2107.14795, 2021.
- Coarse-to-fine q-attention with learned path ranking. arXiv preprint arXiv:2204.01571, 2022a.
- Coarse-to-fine q-attention with tree expansion. arXiv preprint arXiv:2204.12471, 2022b.
- Q-attention: Enabling efficient learning for vision-based robotic manipulation. IEEE Robotics and Automation Letters, 7(2):1612–1619, 2022.
- Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task. In Conference on Robot Learning, pages 334–343. PMLR, 2017.
- Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020.
- Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13739–13748, 2022.
- Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
- Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293, 2018.
- Efficient diffusion policies for offline reinforcement learning. arXiv preprint arXiv:2305.20081, 2023.
- Learning multi-level hierarchies with hindsight. arXiv preprint arXiv:1712.00948, 2017.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
- Sim-to-real reinforcement learning for deformable object manipulation. In Conference on Robot Learning, pages 734–743. PMLR, 2018.
- Hierarchical reinforcement learning under mixed observability. In International Workshop on the Algorithmic Foundations of Robotics, pages 188–204. Springer, 2022.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning, pages 785–799. PMLR, 2023.
- Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Diffusion policies as an expressive policy class for offline reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023.
- Novel view synthesis with diffusion models. arXiv preprint arXiv:2210.04628, 2022.
- Learning to manipulate deformable objects without demonstrations. arXiv preprint arXiv:1910.13439, 2019.
- Unifying diffusion models with action detection transformers for multi-task robotic manipulation. In 7th Annual Conference on Robot Learning, 2023.
- Accelerated policy learning with parallel differentiable simulation. In International Conference on Learning Representations, 2022.
- Efficient tactile simulation with differentiability for robotic manipulation. In Conference on Robot Learning, pages 1488–1498. PMLR, 2023.
- On the effectiveness of fine-tuning versus meta-reinforcement learning. Advances in Neural Information Processing Systems, 35:26519–26531, 2022.
- Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705, 2023.
- PyTorch Kinematics. 2023.