Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Voyager: An Open-Ended Embodied Agent with Large Language Models (2305.16291v2)

Published 25 May 2023 in cs.AI and cs.LG

Abstract: We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (92)
  1. Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv: Arxiv-1712.05474, 2017.
  2. Habitat: A platform for embodied AI research. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 9338–9346. IEEE, 2019.
  3. robosuite: A modular simulation framework and benchmark for robot learning. arXiv preprint arXiv: Arxiv-2009.12293, 2020.
  4. Interactive gibson benchmark (igibson 0.5): A benchmark for interactive navigation in cluttered environments. arXiv preprint arXiv: Arxiv-1910.14442, 2019.
  5. igibson 1.0: a simulation environment for interactive tasks in large realistic scenes. arXiv preprint arXiv: Arxiv-2012.02924, 2020.
  6. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
  7. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
  8. Video pretraining (vpt): Learning to act by watching unlabeled online videos. arXiv preprint arXiv: Arxiv-2206.11795, 2022.
  9. Creating multimodal interactive agents with imitation and self-supervised learning. arXiv preprint arXiv: Arxiv-2112.03763, 2021.
  10. Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog, 2, 2019.
  11. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv: Arxiv-1901.10995, 2019.
  12. Evolving multimodal robot behavior via many stepping stones with the combinatorial multiobjective evolutionary algorithm. Evolutionary computation, 30(2):131–164, 2022.
  13. Enhanced POET: open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 9940–9951. PMLR, 2020.
  14. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. arXiv preprint arXiv: Arxiv-2106.14876, 2021.
  15. Emergent complexity and zero-shot transfer via unsupervised environment design. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  16. Code as policies: Language model programs for embodied control. arXiv preprint arXiv: Arxiv-2209.07753, 2022.
  17. Program guided agent. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
  18. Proto: Program-guided transformer for program-guided tasks. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 17021–17036, 2021.
  19. Vima: General robot manipulation with multimodal prompts. ARXIV.ORG, 2022.
  20. Cliport: What and where pathways for robotic manipulation. arXiv preprint arXiv: Arxiv-2109.12098, 2021.
  21. SECANT: self-expert cloning for zero-shot generalization of visual policies. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 3088–3099. PMLR, 2021.
  22. Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv: Arxiv-2209.11302, 2022.
  23. Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv: Arxiv-2206.08853, 2022.
  24. Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv: Arxiv-2204.00598, 2022.
  25. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv: Arxiv-2204.01691, 2022.
  26. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv: Arxiv-2207.05608, 2022.
  27. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 9118–9147. PMLR, 2022.
  28. Significant-gravitas/auto-gpt: An experimental open-source attempt to make gpt-4 fully autonomous., 2023.
  29. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv: Arxiv-2210.03629, 2022.
  30. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv: Arxiv-2303.11366, 2023.
  31. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71, 2019.
  32. A comprehensive survey of continual learning: Theory, method and application. arXiv preprint arXiv: Arxiv-2302.00487, 2023.
  33. Playing atari with deep reinforcement learning. arXiv preprint arXiv: Arxiv-1312.5602, 2013.
  34. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv: Arxiv-1912.06680, 2019.
  35. OpenAI. Gpt-4 technical report. arXiv preprint arXiv: Arxiv-2303.08774, 2023.
  36. Emergent abilities of large language models. arXiv preprint arXiv: Arxiv-2206.07682, 2022.
  37. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  38. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020.
  39. Diversity is all you need: Learning skills without a reward function. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  40. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 5032–5043, 2018.
  41. Evaluating large language models trained on code. arXiv preprint arXiv: Arxiv-2107.03374, 2021.
  42. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv: Arxiv-1901.01753, 2019.
  43. Automatic curriculum learning for deep RL: A short survey. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 4819–4825. ijcai.org, 2020.
  44. Intrinsically motivated goal exploration processes with automatic curriculum learning. The Journal of Machine Learning Research, 23(1):6818–6858, 2022.
  45. Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. arXiv preprint arXiv: Arxiv-2006.08381, 2020.
  46. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv: Arxiv-2201.11903, 2022.
  47. Asynchronous methods for deep reinforcement learning. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1928–1937. JMLR.org, 2016.
  48. Proximal policy optimization algorithms. arXiv preprint arXiv: Arxiv-1707.06347, 2017.
  49. Continuous control with deep reinforcement learning. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
  50. Introducing chatgpt, 2022.
  51. New and improved embedding model, 2022.
  52. PrismarineJS. Prismarinejs/mineflayer: Create minecraft bots with a powerful, stable, and high level javascript api., 2013.
  53. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. ARXIV.ORG, 2023.
  54. Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction. arXiv preprint arXiv: Arxiv-2301.10034, 2023.
  55. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv: Arxiv-2302.01560, 2023.
  56. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv: Arxiv-2303.12712, 2023.
  57. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv: Arxiv-2304.01852, 2023.
  58. Prismer: A vision-language model with an ensemble of experts. arXiv preprint arXiv: Arxiv-2303.02506, 2023.
  59. Palm-e: An embodied multimodal language model. arXiv preprint arXiv: Arxiv-2303.03378, 2023.
  60. Llama: Open and efficient foundation language models. arXiv preprint arXiv: Arxiv-2302.13971, 2023.
  61. Minerl: A large-scale dataset of minecraft demonstrations. In Sarit Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 2442–2448. ijcai.org, 2019.
  62. The minerl 2019 competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv: Arxiv-1904.10079, 2019.
  63. The minerl 2020 competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv: Arxiv-2101.11071, 2021.
  64. Minerl diamond 2021 competition: Overview, results, and lessons learned. arXiv preprint arXiv: Arxiv-2202.10583, 2022.
  65. The malmo platform for artificial intelligence experimentation. In Subbarao Kambhampati, editor, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pages 4246–4247. IJCAI/AAAI Press, 2016.
  66. Juewu-mc: Playing minecraft with sample-efficient hierarchical reinforcement learning. arXiv preprint arXiv: Arxiv-2112.04907, 2021.
  67. Seihai: A sample-efficient hierarchical ai for the minerl competition. arXiv preprint arXiv: Arxiv-2111.08857, 2021.
  68. Hierarchical deep q-network from imperfect demonstrations in minecraft. Cogn. Syst. Res., 65:74–78, 2021.
  69. Mastering diverse domains through world models. arXiv preprint arXiv: Arxiv-2301.04104, 2023.
  70. Craft an iron sword: Dynamically generating interactive game characters by prompting large language models tuned on code. In Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022), pages 25–43, Seattle, United States, 2022. Association for Computational Linguistics.
  71. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv: 2303.16563, 2023.
  72. On the opportunities and risks of foundation models. arXiv preprint arXiv: Arxiv-2108.07258, 2021.
  73. Palm: Scaling language modeling with pathways. arXiv preprint arXiv: Arxiv-2204.02311, 2022.
  74. Scaling instruction-finetuned language models. arXiv preprint arXiv: Arxiv-2210.11416, 2022.
  75. A survey of embodied AI: from simulators to research tasks. IEEE Trans. Emerg. Top. Comput. Intell., 6(2):230–244, 2022.
  76. Rearrangement: A challenge for embodied ai. arXiv preprint arXiv: Arxiv-2011.01975, 2020.
  77. Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 3:297–330, 2020.
  78. A review of physics simulators for robotic applications. IEEE Access, 9:51416–51431, 2021.
  79. Film: Following instructions in language with modular methods. International Conference on Learning Representations, 2021.
  80. A persistent spatial semantic representation for high-level natural language instruction execution. In 5th Annual Conference on Robot Learning, 2021.
  81. Dera: Enhancing large language model completions with dialog-enabled resolving agents. arXiv preprint arXiv: Arxiv-2303.17071, 2023.
  82. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv: Arxiv-2304.03442, 2023.
  83. Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning. arXiv preprint arXiv: 2305.15486, 2023.
  84. A conversational paradigm for program synthesis. arXiv preprint arXiv: Arxiv-2203.13474, 2022.
  85. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. arXiv preprint arXiv: Arxiv-2207.01780, 2022.
  86. Execution-guided neural program synthesis. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  87. Latent execution for neural program synthesis. arXiv preprint arXiv: Arxiv-2107.00101, 2021.
  88. Write, execute, assess: Program synthesis with a REPL. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 9165–9174, 2019.
  89. Competition-level code generation with alphacode. arXiv preprint arXiv: Arxiv-2203.07814, 2022.
  90. Training verifiers to solve math word problems. arXiv preprint arXiv: Arxiv-2110.14168, 2021.
  91. Lever: Learning to verify language-to-code generation with execution. arXiv preprint arXiv: Arxiv-2302.08468, 2023.
  92. Errors are useful prompts: Instruction guided task programming with verifier-assisted iterative prompting. arXiv preprint arXiv: Arxiv-2303.14100, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Guanzhi Wang (14 papers)
  2. Yuqi Xie (9 papers)
  3. Yunfan Jiang (11 papers)
  4. Ajay Mandlekar (41 papers)
  5. Chaowei Xiao (110 papers)
  6. Yuke Zhu (134 papers)
  7. Linxi Fan (33 papers)
  8. Anima Anandkumar (236 papers)
Citations (613)

Summary

This paper introduces Voyager, an embodied agent powered by LLMs, specifically GPT-4, designed for open-ended exploration and lifelong learning within the Minecraft environment. The goal is to create an agent that can continuously explore, acquire diverse skills, and make novel discoveries without requiring human intervention or predefined goals, mimicking how humans learn and adapt in complex environments.

Voyager consists of three key components:

  1. Automatic Curriculum: This module dynamically proposes suitable tasks for the agent based on its current state (inventory, location, biome, etc.), exploration progress, and skill level. It prompts GPT-4 with the goal of maximizing exploration and discovering diverse things, generating tasks that are challenging but achievable. This acts as a form of in-context novelty search, guiding the agent towards progressively more complex goals.
  2. Skill Library: Voyager learns and stores successful behaviors as executable code (JavaScript programs using Mineflayer APIs). When a task proposed by the curriculum is successfully completed, the corresponding program is added to the skill library. Each skill is indexed by an embedding of its description (generated by GPT-3.5). When facing a new task, Voyager retrieves relevant skills from the library based on semantic similarity to aid in generating new code. This allows skills to be reused, composed into more complex behaviors, and mitigates catastrophic forgetting.
  3. Iterative Prompting Mechanism: Since LLMs often fail to generate perfect code in one shot, Voyager uses an iterative refinement process. It prompts GPT-4 to generate code for the current task. This code is executed in Minecraft. Voyager then gathers feedback:
    • Environment Feedback: Observations about the execution's outcome (e.g., "Cannot craft X, need Y more Z").
    • Execution Errors: Errors from the JavaScript interpreter if the code is invalid.
    • Self-Verification: Another instance of GPT-4 acts as a critic, assessing whether the task was successfully completed based on the agent's state and the task description. If not, it provides a critique suggesting improvements. This feedback is incorporated into the prompt for the next round of code generation, allowing GPT-4 to refine the program iteratively until the self-verification module confirms success or a maximum number of attempts is reached.

Implementation and Evaluation:

  • Voyager interacts with GPT-4 via blackbox API queries, requiring no model fine-tuning.
  • It operates within the MineDojo framework, using Mineflayer JavaScript APIs as its low-level controller.
  • Experiments compared Voyager against adapted versions of LLM agent techniques like ReAct, Reflexion, and AutoGPT in Minecraft.
  • Results: Voyager significantly outperformed baselines:
    • Discovered 3.3x more unique items.
    • Traversed 2.3x longer distances across diverse terrains.
    • Unlocked Minecraft tech tree milestones (wood, stone, iron, diamond) significantly faster (up to 15.3x faster for wood). Voyager was the only agent to reach the diamond level.
    • Demonstrated strong zero-shot generalization by successfully using its learned skill library to solve novel, unseen tasks in new worlds, whereas baselines failed. The skill library also improved the performance of AutoGPT when provided to it.
  • Ablation Studies: Confirmed the critical importance of each component. Removing the automatic curriculum, skill library, or self-verification significantly degraded performance. Using GPT-4 for code generation was substantially better than GPT-3.5.
  • Human Feedback: The paper also showed Voyager can build complex 3D structures (like a Nether Portal or a house) when augmented with human feedback, where humans act either as the critic or the curriculum provider.

Limitations:

  • Cost: High cost associated with GPT-4 API calls.
  • Inaccuracies: Occasional failures in code generation or self-verification.
  • Hallucinations: GPT-4 sometimes proposes impossible tasks (e.g., crafting non-existent items) or generates code with invalid assumptions (e.g., using wrong fuel) or non-existent API calls.

Conclusion:

Voyager represents a significant step towards creating generalist, embodied agents capable of lifelong learning in open-ended environments. It effectively leverages the capabilities of LLMs for curriculum generation, skill acquisition via code, and iterative self-improvement through environmental and self-generated feedback, all without requiring gradient-based training.

Youtube Logo Streamline Icon: https://streamlinehq.com