Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion (2402.16075v4)

Published 25 Feb 2024 in cs.LG, cs.AI, and cs.RO

Abstract: Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise. However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data. The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to mitigate the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies, thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available. In experiments on challenging simulation benchmarks and on real robots, BRIDGER outperforms state-of-the-art diffusion policies. We provide further analysis on design considerations when applying BRIDGER. Code for BRIDGER is available at https://github.com/clear-nus/bridger.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Residual reinforcement learning from demonstrations. arXiv preprint arXiv:2106.08050, 2021.
  2. Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797, 2023.
  3. Speedfolding: Learning efficient bimanual folding of garments. in 2022 ieee. In RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–8.
  4. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
  5. Playfusion: Skill acquisition via diffusion from language-annotated play. In Conference on Robot Learning, pages 2012–2029. PMLR, 2023.
  6. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
  7. Iifl: Implicit interactive fleet learning from heterogeneous human supervisors. In Conference on Robot Learning, pages 2340–2356. PMLR, 2023.
  8. Implicit generation and generalization in energy-based models. arXiv preprint arXiv:1903.08689, 2019.
  9. One-shot imitation learning. Advances in neural information processing systems, 30, 2017.
  10. One-shot visual imitation learning via meta-learning. In Conference on robot learning, pages 357–368. PMLR, 2017.
  11. Implicit behavioral cloning. In Conference on Robot Learning, pages 158–168. PMLR, 2022.
  12. Self-supervised correspondence in visuomotor policy learning. IEEE Robotics and Automation Letters, 5(2):492–499, 2019.
  13. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  14. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Conference on Robot Learning, pages 1025–1037. PMLR, 2020.
  15. Scaling up and distilling down: Language-guided robot skill acquisition. In Conference on Robot Learning, pages 3766–3777. PMLR, 2023.
  16. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  17. Rf-policy: Rectified flows are computation-adaptive decision makers. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
  18. Conditional stochastic interpolation for generative learning. arXiv preprint arXiv:2312.05579, 2023.
  19. Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35, 2017.
  20. Strictly batch imitation learning by energy-based distribution matching. Advances in Neural Information Processing Systems, 33:7354–7365, 2020.
  21. Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA), pages 6023–6029. IEEE, 2019.
  22. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  23. Meta-imitation learning by watching video demonstrations. In International Conference on Learning Representations, 2021.
  24. I22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTsb: Image-to-image schr\\\backslash\” odinger bridge. arXiv preprint arXiv:2302.05872, 2023.
  25. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  26. Isaac gym: High performance gpu based physics simulation for robot learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
  27. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning, pages 1678–1690. PMLR, 2022.
  28. Human-in-the-loop task and motion planning for imitation learning. In Conference on Robot Learning, pages 3030–3060. PMLR, 2023.
  29. Generative skill chaining: Long-horizon skill planning with diffusion models. In Conference on Robot Learning, pages 2905–2925. PMLR, 2023.
  30. Diffusion co-policy for synergistic human-robot collaborative tasks. IEEE Robotics and Automation Letters, 2023.
  31. Imitating human behaviour with diffusion models. In The Eleventh International Conference on Learning Representations (ICLR 2023), 2023.
  32. Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
  33. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. In 2018 IEEE international conference on robotics and automation (ICRA), pages 3758–3765. IEEE, 2018.
  34. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. Robotics: Science and Systems XIV, 2018.
  35. Multimodal diffusion transformer for learning from play. In 2nd Workshop on Language and Robot Learning: Language as Grounding, 2023.
  36. Goal-conditioned imitation learning using score-based diffusion policies. arXiv preprint arXiv:2304.02532, 2023.
  37. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
  38. Residual policy learning for shared autonomy. arXiv preprint arXiv:2004.05097, 2020.
  39. Movement primitive diffusion: Learning gentle robotic manipulation of deformable objects. arXiv preprint arXiv:2312.10008, 2023.
  40. Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. Advances in neural information processing systems, 35:22955–22968, 2022.
  41. Multiple interactions made easy (mime): Large scale demonstrations data for imitation. In Conference on robot learning, pages 906–915. PMLR, 2018.
  42. Residual policy learning. arXiv preprint arXiv:1812.06298, 2018.
  43. Revisiting energy based models as policies: Ranking noise contrastive estimation and interpolating energy models. arXiv preprint arXiv:2309.05803, 2023.
  44. Learning structured output representation using deep conditional generative models. Advances in neural information processing systems, 28, 2015.
  45. Denoising diffusion implicit models. In International Conference on Learning Representations, 2020.
  46. The magical benchmark for robust imitation. Advances in Neural Information Processing Systems, 33:18284–18295, 2020.
  47. Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5923–5930. IEEE, 2023.
  48. Integrated grasp and motion planning. In 2010 IEEE International Conference on Robotics and Automation, pages 2883–2888. IEEE, 2010.
  49. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
  50. Spatial action maps for mobile manipulation. arXiv preprint arXiv:2004.09141, 2020.
  51. Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation. In Conference on Robot Learning, pages 2323–2339. PMLR, 2023.
  52. Xskill: Cross embodiment skill discovery. In Conference on Robot Learning, pages 3536–3555. PMLR, 2023.
  53. Eliminating lipschitz singularities in diffusion models. arXiv preprint arXiv:2306.11251, 2023.
  54. To the noise and back: Diffusion for shared autonomy. arXiv preprint arXiv:2302.12244, 2023.
  55. One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557, 2018.
  56. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 5628–5635. IEEE, 2018.
  57. Imitation learning: Progress, taxonomies and challenges. IEEE Transactions on Neural Networks and Learning Systems, (99):1–16, 2022.
  58. Denoising diffusion bridge models. arXiv preprint arXiv:2309.16948, 2023.
Citations (4)

Summary

We haven't generated a summary for this paper yet.