LgTS: Dynamic Task Sampling using LLM-generated sub-goals for Reinforcement Learning Agents (2310.09454v1)
Abstract: Recent advancements in reasoning abilities of LLMs (LLM) has promoted their usage in problems that require high-level planning for robots and artificial agents. However, current techniques that utilize LLMs for such planning tasks make certain key assumptions such as, access to datasets that permit finetuning, meticulously engineered prompts that only provide relevant and essential information to the LLM, and most importantly, a deterministic approach to allow execution of the LLM responses either in the form of existing policies or plan operators. In this work, we propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs to provide a graphical representation of the sub-goals to a reinforcement learning (RL) agent that does not have access to the transition dynamics of the environment. The RL agent uses Teacher-Student learning algorithm to learn a set of successful policies for reaching the goal state from the start state while simultaneously minimizing the number of environmental interactions. Unlike previous methods that utilize LLMs, our approach does not assume access to a propreitary or a fine-tuned LLM, nor does it require pre-trained policies that achieve the sub-goals proposed by the LLM. Through experiments on a gridworld based DoorKey domain and a search-and-rescue inspired domain, we show that generating a graphical structure of sub-goals helps in learning policies for the LLM proposed sub-goals and the Teacher-Student learning algorithm minimizes the number of environment interactions when the transition dynamics are unknown.
- Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, 39–1. JMLR Workshop and Conference Proceedings.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
- A framework for transforming specifications in reinforcement learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, 604–624. Springer.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47: 235–256.
- Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 10349–10355. IEEE.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818.
- Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments From Temporal Logic Specifications. IEEE Robotics and Automation Letters, 8(4): 2158–2165.
- Minimalistic Gridworld Environment for Gymnasium.
- Foundations for restraining bolts: Reinforcement learning with LTLf/LDLf restraining specifications. In Intl. Conf. on Automated Planning and Scheduling, volume 29.
- Linear temporal logic and linear dynamic logic on finite traces. In IJCAI’13 Proc. of the Twenty-Third Intl. joint Conf. on Artificial Intelligence, 854–860. Association for Computing Machinery.
- Dictionary, M.-W. 2002. Merriam-webster. On-line at http://www. mw. com/home. htm, 8(2).
- Leveraging Commonsense Knowledge from Large Language Models for Task and Motion Planning. In RSS 2023 Workshop on Learning for Task and Motion Planning.
- Task and motion planning with large language models for object rearrangement. arXiv preprint arXiv:2303.06247.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378.
- Multi-agent reinforcement learning with temporal logic specifications. arXiv preprint arXiv:2102.00582.
- Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73: 173–208.
- Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1): 99–134.
- Reward design with language models. arXiv preprint arXiv:2303.00001.
- Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3834–3839. IEEE.
- Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477.
- Teacher-Student Curriculum Learning. IEEE Trans. Neural Networks Learn. Syst., 31(9): 3732–3740.
- Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys.
- Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. JMLR, 21: 1–50.
- What is intrinsic motivation? A typology of computational approaches. Frontiers in neurorobotics, 6.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Planning with large language models via corrective re-prompting. arXiv preprint arXiv:2211.09935.
- Proximal Policy Optimization Algorithms. CoRR.
- ACuTE: Automatic Curriculum Transfer from Simple to Complex Environments. In 21st Intl. Conf. on Autonomous Agents and Multiagent Systems, 1192–1200.
- Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 11523–11530. IEEE.
- Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088.
- Open-world object manipulation using pre-trained vision-language models. arXiv preprint arXiv:2303.00905.
- Szepesvári, C. 2004. Shortest path discovery problems: A framework, algorithms and experimental results. In AAAI, 550–555.
- Teaching multiple tasks to an RL agent using LTL. In Autonomous Agents and MultiAgent Systems.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 24824–24837.
- Transfer of temporal logic formulas in reinforcement learning. In IJCAI: proceedings of the conference, volume 28, 4010. NIH Public Access.
- LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent. arXiv preprint arXiv:2309.12311.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Yash Shukla (7 papers)
- Wenchang Gao (2 papers)
- Vasanth Sarathy (12 papers)
- Alvaro Velasquez (56 papers)
- Robert Wright (21 papers)
- Jivko Sinapov (29 papers)