Hierarchical Continual Reinforcement Learning via Large Language Model (2401.15098v2)
Abstract: The ability to learn continuously in dynamic environments is a crucial requirement for reinforcement learning (RL) agents applying in the real world. Despite the progress in continual reinforcement learning (CRL), existing methods often suffer from insufficient knowledge transfer, particularly when the tasks are diverse. To address this challenge, we propose a new framework, Hierarchical Continual reinforcement learning via LLM (Hi-Core), designed to facilitate the transfer of high-level knowledge. Hi-Core orchestrates a twolayer structure: high-level policy formulation by a LLM, which represents agenerates a sequence of goals, and low-level policy learning that closely aligns with goal-oriented RL practices, producing the agent's actions in response to the goals set forth. The framework employs feedback to iteratively adjust and verify highlevel policies, storing them along with low-level policies within a skill library. When encountering a new task, Hi-Core retrieves relevant experience from this library to help to learning. Through experiments on Minigrid, Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.
- Policy and value transfer in lifelong reinforcement learning. In ICML, pages 20–29, 2018.
- Language reward modulation for pretraining reinforcement learning. arXiv preprint arXiv:2308.12270, 2023.
- Compositional foundation models for hierarchical planning. arXiv preprint arXiv:2309.08587, 2023.
- Comps: Continual meta policy search. In ICLR, 2022.
- Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges. In CoLLAs, 2023.
- Near-optimal goal-oriented reinforcement learning in non-stationary environments. Advances in Neural Information Processing Systems, 35:33973–33984, 2022.
- Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. arXiv preprint arXiv:2306.13831, 2023.
- Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: A short survey. Journal of Artificial Intelligence Research, 74:1159–1199, 2022.
- Collaborating with language models for embodied reasoning. In LaReL, 2022.
- Don’t forget, there is more than forgetting: New metrics for continual learning. arXiv preprint arXiv:1810.13166, 2018.
- Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, 30(11):3409–3418, 2019.
- Evidence for hierarchical cognitive control in the human cerebellum. Current Biology, 30(10):1881–1892.e3, 2020.
- The role of prefrontal cortex in cognitive control and executive function. Neuropsychopharmacology, 47(1):72–89, 2022.
- Building a Subspace of Policies for Scalable Continual Learning. In ICLR, 2023.
- Enabling intelligent interactions between an agent and an llm: A reinforcement learning approach. arXiv preprint arXiv:2306.03604, 2023.
- Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
- Same state, different task: Continual reinforcement learning without interference. In AAAI, pages 7143–7151, 2022.
- Towards continual reinforcement learning: A review and perspectives. Journal of Artificial Intelligence Research, 75:1401–1476, 2022.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
- Biological underpinnings for lifelong learning machines. Nature Machine Intelligence, 4(3):196–210, 2022.
- Reward Design with Language Models. In ICLR, 2023.
- Cl-wstc: Continual learning for weakly supervised text classification on the internet. In WWW, pages 1489–1499, 2023.
- Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency. arXiv preprint arXiv:2309.17382, 2023.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In CVPR, pages 7765–7773, 2018.
- Towards A Unified Agent with Foundation Models. In ICLR, 2023.
- Stable-baselines3: Reliable reinforcement learning implementations. The Journal of Machine Learning Research, 22(1):12348–12355, 2021.
- Experience replay for continual learning. In NeurIPS, pages 348–358, 2019.
- Progressive Neural Networks. arXiv preprint arXiv:1606.04671, 2016.
- Hypernetwork-PPO for Continual Reinforcement Learning. In NeurIPS DeepRL Workshop, 2022.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Powerpropagation: A sparsity inducing weight reparameterisation. In NeurIPS, pages 28889–28903, 2021.
- Reflexion: Language agents with verbal reinforcement learning. In NeurIPS, 2023.
- Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv preprint arXiv:2305.16291, 2023.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023.
- A comprehensive survey of continual learning: Theory, method and application. arXiv preprint arXiv:2302.00487, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Continual world: A robotic benchmark for continual reinforcement learning. In NeurIPS, pages 28496–28510, 2021.
- Disentangling transfer in continual reinforcement learning. NeurIPS, 35:6304–6317, 2022.
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
- Language to Rewards for Robotic Skill Synthesis. In CoRL, 2023.
- Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance. In CoRL, 2023.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.