Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 26 tok/s Pro
GPT-4o 98 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 216 tok/s Pro
2000 character limit reached

Language-guided Skill Learning with Temporal Variational Inference (2402.16354v2)

Published 26 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: We present an algorithm for skill discovery from expert demonstrations. The algorithm first utilizes LLMs to propose an initial segmentation of the trajectories. Following that, a hierarchical variational inference framework incorporates the LLM-generated segmentation information to discover reusable skills by merging trajectory segments. To further control the trade-off between compression and reusability, we introduce a novel auxiliary objective based on the Minimum Description Length principle that helps guide this skill discovery process. Our results demonstrate that agents equipped with our method are able to discover skills that help accelerate learning and outperform baseline skill learning approaches on new long-horizon tasks in BabyAI, a grid world navigation environment, as well as ALFRED, a household simulation environment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. OPAL: offline primitive discovery for accelerating offline reinforcement learning. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
  2. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3674–3683, 2018.
  3. Modular multitask reinforcement learning with policy sketches. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.  166–175. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/andreas17a.html.
  4. Learning and leveraging verifiers to improve planning capabilities of pre-trained language models, 2023.
  5. Skill discovery for exploration and planning using deep skill graphs. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  521–531. PMLR, 2021.
  6. Reinforcement learning for mapping instructions to actions. In Su, K.-Y., Su, J., Wiebe, J., and Li, H. (eds.), Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp.  82–90, Suntec, Singapore, August 2009. Association for Computational Linguistics. URL https://aclanthology.org/P09-1010.
  7. Learning to interpret natural language navigation instructions from observations. In AAAI 2011, 2011.
  8. Babyai: A platform to study the sample efficiency of grounded language learning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  9. Elements of Information Theory. Wiley, 2001.
  10. Multi-level discovery of deep options. CoRR, abs/1703.08294, 2017.
  11. Devise: A deep visual-semantic embedding model. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper_files/paper/2013/file/7cce53cf90577442771720a370c3c723-Paper.pdf.
  12. Meta-learning parameterized skills. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp.  10461–10481. PMLR, 2023.
  13. LISA: learning interpretable skill abstractions from language. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
  14. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Dy, J. G. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp.  1856–1865. PMLR, 2018.
  15. Mastering diverse domains through world models. CoRR, abs/2301.04104, 2023.
  16. TD-MPC2: scalable, robust world models for continuous control. CoRR, abs/2310.16828, 2023.
  17. Learning an embedding space for transferable robot skills. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
  18. Learning and transfer of modulated locomotor controllers. CoRR, abs/1610.05182, 2016.
  19. Rainbow: Combining improvements in deep reinforcement learning. In McIlraith, S. A. and Weinberger, K. Q. (eds.), Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp.  3215–3222. AAAI Press, 2018.
  20. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207, 2022.
  21. Categorical reparameterization with gumbel-softmax. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
  22. Learning options via compression. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
  23. Variational temporal abstraction. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E. B., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp.  11566–11575, 2019.
  24. Auto-encoding variational bayes. In Bengio, Y. and LeCun, Y. (eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
  25. Compile: Compositional imitation learning and execution. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp.  3418–3428. PMLR, 2019.
  26. AI2-THOR: an interactive 3d environment for visual AI. CoRR, abs/1712.05474, 2017.
  27. Skill discovery in continuous reinforcement learning domains using skill chaining. In Bengio, Y., Schuurmans, D., Lafferty, J. D., Williams, C. K. I., and Culotta, A. (eds.), Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada, pp.  1015–1023. Curran Associates, Inc., 2009.
  28. Robot learning from demonstration by constructing skill trees. Int. J. Robotics Res., 31(3):360–375, 2012.
  29. DDCO: discovery of deep continuous options for robot learning from demonstrations. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, volume 78 of Proceedings of Machine Learning Research, pp.  418–437. PMLR, 2017.
  30. Multi-game decision transformers. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
  31. Llm+p: Empowering large language models with optimal planning proficiency, 2023.
  32. Asking for knowledge (AFK): Training RL agents to query external knowledge using language. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  14073–14093. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/liu22t.html.
  33. Movement segmentation using a primitive library. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011, pp.  3407–3412. IEEE, 2011.
  34. Mapping instructions and visual observations to actions with reinforcement learning. In EMNLP, 2017.
  35. Human-level control through deep reinforcement learning. Nat., 518(7540):529–533, 2015.
  36. TSC-DL: unsupervised trajectory segmentation of multi-modal surgical demonstrations with deep learning. In Kragic, D., Bicchi, A., and Luca, A. D. (eds.), 2016 IEEE International Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, May 16-21, 2016, pp.  4150–4157. IEEE, 2016.
  37. Learning and generalization of complex tasks from unstructured demonstrations. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012, pp.  5239–5246. IEEE, 2012.
  38. Incremental semantically grounded learning from demonstration. In Newman, P., Fox, D., and Hsu, D. (eds.), Robotics: Science and Systems IX, Technische Universität Berlin, Berlin, Germany, June 24 - June 28, 2013, 2013.
  39. Episodic transformer for vision-and-language navigation. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp.  15922–15932. IEEE, 2021.
  40. Accelerating reinforcement learning with learned skill priors. In Kober, J., Ramos, F., and Tomlin, C. J. (eds.), 4th Conference on Robot Learning, CoRL 2020, 16-18 November 2020, Virtual Event / Cambridge, MA, USA, volume 155 of Proceedings of Machine Learning Research, pp.  188–204. PMLR, 2020.
  41. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020.
  42. Learning transferable motor skills with hierarchical latent mixture policies. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  43. A generalist agent. Trans. Mach. Learn. Res., 2022, 2022.
  44. Faster R-CNN: towards real-time object detection with region proposal networks. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp.  91–99, 2015.
  45. Learning by playing solving sparse reward tasks from scratch. In Dy, J. G. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp.  4341–4350. PMLR, 2018.
  46. Rissanen, J. Modeling by shortest data description. Autom., 14(5):465–471, 1978.
  47. Learning robot skills with temporal variational inference. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp.  8624–8633. PMLR, 2020.
  48. Discovering motor programs by recomposing demonstrations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
  49. Directed-info GAIL: learning hierarchical policies from unsegmented demonstrations using directed information. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  50. Skill induction and planning with latent language. In Muresan, S., Nakov, P., and Villavicencio, A. (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp.  1713–1726. Association for Computational Linguistics, 2022.
  51. ALFRED: A benchmark for interpreting grounded instructions for everyday tasks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp.  10737–10746. Computer Vision Foundation / IEEE, 2020a.
  52. Alfworld: Aligning text and embodied environments for interactive learning. CoRR, abs/2010.03768, 2020b. URL https://arxiv.org/abs/2010.03768.
  53. Generalized planning in PDDL domains with pretrained large language models. In AAAI Conference on Artificial Intelligence (AAAI), 2024.
  54. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  11523–11530, 2023. doi: 10.1109/ICRA48891.2023.10161317.
  55. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023.
  56. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artif. Intell., 112(1-2):181–211, 1999.
  57. Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI, 2011.
  58. Voyager: An open-ended embodied agent with large language models. CoRR, abs/2305.16291, 2023.
  59. Learning adaptive planning representations with natural language guidance, 2023.
  60. Aspire: Adaptive skill priors for reinforcement learning. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
Citations (4)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces a framework that uses LLMs to generate fine-grained trajectory segments and hierarchical temporal variational inference to convert them into reusable skills.
  • It demonstrates enhanced learning efficiency and superior task performance compared to baseline methods in environments like BabyAI and ALFRED.
  • By incorporating the MDL principle, the approach balances data compression and skill adaptability, offering scalable insights for AI-driven tasks.

An Expert Analysis of "Language-guided Skill Learning with Temporal Variational Inference"

The paper "Language-guided Skill Learning with Temporal Variational Inference" presents a novel algorithm aimed at skill discovery using expert demonstrations. This paper's primary focus revolves around leveraging LLMs for initial trajectory segmentation, subsequently incorporating a hierarchical variational inference framework to refine and merge these segments into reusable skills. Below, I provide an in-depth analysis of the methodology, results, and potential implications of this research within the field of AI and skill learning.

Overview of the Methodology

The authors introduce a unique approach that initially employs LLMs to propose a preliminary segmentation of trajectories from expert demonstrations. This initial step is crucial as it reduces the complexities inherent in trajectory segmentation, which traditionally involves a vast search space that grows exponentially with an increased horizon length. The significance of using LLMs lies in their ability to generate fine-grained, semantically meaningful segments that are subsequently enhanced through a temporal variational inference framework.

The core of the proposed framework lies in its ability to merge these initial granular segments into coherent skills by employing a hierarchical variational inference strategy. It uses the Minimum Description Length (MDL) principle as an auxiliary objective to effectively guide the balance between compression and skill reusability. This framework is designed to discover semantically meaningful skills from given trajectories, considering the inherent trade-off between describing trajectories concisely and maintaining skill adaptability.

Key Findings and Results

The paper presents empirical results demonstrating that the proposed method outperforms existing baseline approaches across diverse domains, notably in BabyAI—a grid world navigation environment—and ALFRED—a challenging household simulation environment. These results are backed by extensive simulations showing improved learning efficiency and superior task performance in long-horizon tasks compared to baseline methods.

In comparison, methods like LOVE and LISA, which either lack language assistance or do not employ a strong variational inference framework, were less effective in discovering and leveraging reusable skills for complex task execution. Moreover, distinct numerical results illustrated that the introduction of the MDL-based auxiliary objective aided in achieving a more optimal balance between skill generalization and data compression, further boosting task performance and learning efficiency.

Implications and Speculative Future Directions

The theoretical and practical contributions of this work have several implications for the field of reinforcement learning and artificial intelligence at large. By optimally utilizing LLMs to generate initial segments, the proposed method capitalizes on the rich semantic embedding capabilities of these models, pointing towards a more informative and robust approach in skill learning paradigms. The incorporation of the MDL principle in skill learning can transform how researchers approach the balance between expressiveness and succinctness in skill definitions.

Future directions may explore how LLMs can be utilized not just in segmentation but as integral components in more complex decision-making workflows, where semantics and contextual understanding are pivotal. Further studies could explore scaling this approach to more dynamic environments and seeking integration with other forms of learning, such as unsupervised or self-supervised learning paradigms.

In conclusion, this paper's methodological novelty and empirical validation demonstrate its contributions toward advancing skill learning techniques. Its ability to foster more efficient learning and task performance opens new avenues for research both in academic explorations and practical applications in AI-driven tasks. The proposed method stands as a testament to the potential embedded in integrating language-based insights with sophisticated variational approaches to skill learning.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.