PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play (2312.04549v1)
Abstract: Learning from unstructured and uncurated data has become the dominant paradigm for generative approaches in language and vision. Such unstructured and unguided behavior data, commonly known as play, is also easier to collect in robotics but much more difficult to learn from due to its inherently multimodal, noisy, and suboptimal nature. In this paper, we study this problem of learning goal-directed skill policies from unstructured play data which is labeled with language in hindsight. Specifically, we leverage advances in diffusion models to learn a multi-task diffusion model to extract robotic skills from play data. Using a conditional denoising diffusion process in the space of states and actions, we can gracefully handle the complexity and multimodality of play data and generate diverse and interesting robot behaviors. To make diffusion models more useful for skill learning, we encourage robotic agents to acquire a vocabulary of skills by introducing discrete bottlenecks into the conditional behavior generation process. In our experiments, we demonstrate the effectiveness of our approach across a wide variety of environments in both simulation and the real world. Results visualizations and videos at https://play-fusion.github.io
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Automated learning and discovery state-of-the-art and research topics in a rapidly growing field. Ai Magazine, 20(3):78–78, 1999.
- M. Pickett and A. G. Barto. Policyblocks: An algorithm for creating useful macro-actions in reinforcement learning. In ICML, volume 19, pages 506–513, 2002.
- The option-critic architecture. In AAAI, 2017.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999.
- Hierarchical relative entropy policy search. Journal of Machine Learning Research, 2016.
- Reinforcement learning with sequences of motion primitives for robust manipulation. Transactions on Robotics, 2012.
- J. Kober and J. Peters. Learning motor primitives for robotics. In ICRA, 2009.
- Skill learning and task outcome prediction for manipulation. In ICRA, 2011.
- Accelerating robotic reinforcement learning via parameterized action primitives. NeurIPS, 2021.
- Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. In ICRA, 2022.
- Accelerating reinforcement learning with learned skill priors. In Conference on robot learning, pages 188–204. PMLR, 2021.
- Hierarchical neural dynamic policies. RSS, 2021.
- Cross-domain transfer via semantic skill imitation. arXiv preprint arXiv:2212.07407, 2022.
- Diversity is all you need: learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657, 2019.
- Catch & carry: reusable neural controllers for vision-guided whole-body tasks. ACM Transactions on Graphics (TOG), 39(4):39–1, 2020.
- T. Shankar and A. Gupta. Learning robot skills with temporal variational inference. In International Conference on Machine Learning, pages 8624–8633. PMLR, 2020.
- Compile: Compositional imitation learning and execution. In International Conference on Machine Learning, pages 3418–3428. PMLR, 2019.
- Dynamics-aware embeddings. arXiv preprint arXiv:1908.09357, 2019.
- Learning latent plans from play. arXiv preprint arXiv:1903.01973, 2019.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. arXiv preprint arXiv:1910.11956, 2019.
- From play to policy: Conditional behavior generation from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, 2022.
- Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
- Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
- Idql: Implicit q-learning as an actor-critic method with diffusion policies. arXiv preprint arXiv:2304.10573, 2023.
- Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
- Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks. IEEE Robotics and Automation Letters, 7(3):7327–7334, 2022.
- Transporter networks: Rearranging the visual world for robotic manipulation. CoRL, 2020.
- Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pages 894–906. PMLR, 2022.
- L. P. Kaelbling. Learning to achieve goals. In IJCAI, volume 2, pages 1094–8. Citeseer, 1993.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Learning to reach goals via iterated supervised learning. arXiv preprint arXiv:1912.06088, 2019.
- Retrieval-augmented reinforcement learning. In International Conference on Machine Learning, pages 7740–7765. PMLR, 2022.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- C. Lynch and P. Sermanet. Grounding language in play. arXiv preprint arXiv:2005.07648, 40:105, 2020.
- Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, pages 991–1002. PMLR, 2022.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Learning language-conditioned robot behavior from offline data and crowd-sourced annotation. In Conference on Robot Learning, pages 1303–1315. PMLR, 2022.
- What matters in language conditioned robotic imitation learning over unstructured data. IEEE Robotics and Automation Letters, 7(4):11205–11212, 2022.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning, pages 785–799. PMLR, 2023.
- C. Lynch and P. Sermanet. Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648, 2020.
- D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014.
- Lad: Language augmented diffusion for reinforcement learning. arXiv preprint arXiv:2210.15629, 2022.
- Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. Advances in neural information processing systems, 35:22955–22968, 2022.
- Learning latent dynamics for planning from pixels. arXiv preprint arXiv:1811.04551, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
- Learning universal policies via text-guided video generation. arXiv preprint arXiv:2302.00111, 2023.
- Structdiffusion: Language-guided 304 creation of physically-valid structures using unseen objects. arXiv preprint arXiv:2211.04604, 305:2, 2022.
- Mastering the game of go without human knowledge. Nature, 2017.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, Feb. 2015.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, jun 2013.
- Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
- Masked trajectory models for prediction, representation, and control. arXiv preprint arXiv:2305.02968, 2023.
- Vector quantized models for planning. In International Conference on Machine Learning, pages 8302–8313. PMLR, 2021.
- Neural discrete representation learning. In NeurIPS, pages 6309–6318, 2017.
- N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. URL http://arxiv.org/abs/1512.03385.
- Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705, 2023.
- Goal-conditioned imitation learning. arXiv preprint arXiv:1906.05838, 2019.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33:16857–16867, 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. 2018.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
- Calvin. https://github.com/mees/calvin/.
- Diffusion policy. https://github.com/real-stanford/diffusion_policy.
- Relay policy learning environments. https://github.com/google-research/relay-policy-learning/.
- Cliport. https://github.com/cliport/cliport.
- From play to policy: Conditional behavior generation from uncurated robot data. https://github.com/jeffacce/play-to-policy.
- Lili Chen (34 papers)
- Shikhar Bahl (18 papers)
- Deepak Pathak (91 papers)