Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks (2303.16563v2)

Published 29 Mar 2023 in cs.LG and cs.AI

Abstract: We study building multi-task agents in open-world environments. Without human demonstrations, learning to accomplish long-horizon tasks in a large open-world environment with reinforcement learning (RL) is extremely inefficient. To tackle this challenge, we convert the multi-task learning problem into learning basic skills and planning over the skills. Using the popular open-world game Minecraft as the testbed, we propose three types of fine-grained basic skills, and use RL with intrinsic rewards to acquire skills. A novel Finding-skill that performs exploration to find diverse items provides better initialization for other skills, improving the sample efficiency for skill learning. In skill planning, we leverage the prior knowledge in LLMs to find the relationships between skills and build a skill graph. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 40 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines by a large margin and is the most sample-efficient demonstration-free RL method to solve Minecraft Tech Tree tasks. The project's website and code can be found at https://sites.google.com/view/plan4mc.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Scaling imitation learning in Minecraft. arXiv preprint arXiv:2007.02701, 2020.
  2. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  3. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning (CORL), 2023.
  4. Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction. arXiv preprint arXiv:2301.10034, 2023.
  5. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  6. CLIP4MC: An rl-friendly vision-language model for Minecraft. arXiv preprint arXiv:2303.10571, 2023.
  7. Vision-language models as success detectors. arXiv preprint arXiv:2303.07280, 2023.
  8. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.
  9. MineDojo: Building open-ended embodied agents with internet-scale knowledge. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  10. MineRL: A large-scale dataset of Minecraft demonstrations. Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 2019.
  11. Towards robust and domain agnostic reinforcement learning competitions: MineRL 2020. In NeurIPS 2020 Competition and Demonstration Track, 2021.
  12. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  13. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning (ICML), 2022.
  14. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
  15. The malmo platform for artificial intelligence experimentation. In International Joint Conference on Artificial Intelligence (IJCAI), 2016.
  16. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
  17. MineRL diamond 2021 competition: Overview, results, and lessons learned. NeurIPS 2021 Competitions and Demonstrations Track, 2022.
  18. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. arXiv preprint arXiv:2106.14876, 2021.
  19. Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning (CORL), 2023.
  20. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  21. Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753, 2022.
  22. Steve-1: A generative model for text-to-behavior in minecraft. arXiv preprint arXiv:2306.00937, 2023.
  23. Juewu-mc: Playing Minecraft with sample-efficient hierarchical reinforcement learning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), 2022.
  24. Seihai: A sample-efficient hierarchical ai for the MineRL competition. In Distributed Artificial Intelligence (DAI), 2022.
  25. Retrospective analysis of the 2019 MineRL competition on sample efficient reinforcement learning. In NeurIPS 2019 Competition and Demonstration Track, 2020.
  26. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  27. Unsupervised skill-discovery and skill-learning in Minecraft. arXiv preprint arXiv:2107.08398, 2021.
  28. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. arXiv preprint arXiv:2301.12050, 2023.
  29. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  30. Memory gym: Partially observable challenges to memory-based agents. In International Conference on Learning Representations (ICLR), 2023.
  31. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  32. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  33. Hierarchical deep q-network from imperfect demonstrations in Minecraft. Cognitive Systems Research, 65:74–78, 2021.
  34. Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808, 2021.
  35. A deep hierarchical approach to lifelong learning in Minecraft. In Proceedings of the AAAI conference on artificial intelligence (AAAI), 2017.
  36. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  37. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023.
  38. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  39. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CORL), 2020.
  40. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Haoqi Yuan (11 papers)
  2. Chi Zhang (566 papers)
  3. Hongcheng Wang (20 papers)
  4. Feiyang Xie (1 paper)
  5. Penglin Cai (2 papers)
  6. Hao Dong (175 papers)
  7. Zongqing Lu (88 papers)
Citations (12)