Voyager: An Open-Ended Embodied Agent with Large Language Models (2305.16291v2)

Published 25 May 2023 in cs.AI and cs.LG

Abstract: We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.

References (92)

Authors (8)

Guanzhi Wang (14 papers)
Yuqi Xie (9 papers)
Yunfan Jiang (11 papers)
Ajay Mandlekar (41 papers)
Chaowei Xiao (110 papers)
Yuke Zhu (134 papers)
Linxi Fan (33 papers)
Anima Anandkumar (236 papers)

Citations (613)

View on Semantic Scholar

Summary

This paper introduces Voyager, an embodied agent powered by LLMs, specifically GPT-4, designed for open-ended exploration and lifelong learning within the Minecraft environment. The goal is to create an agent that can continuously explore, acquire diverse skills, and make novel discoveries without requiring human intervention or predefined goals, mimicking how humans learn and adapt in complex environments.

Voyager consists of three key components:

Automatic Curriculum: This module dynamically proposes suitable tasks for the agent based on its current state (inventory, location, biome, etc.), exploration progress, and skill level. It prompts GPT-4 with the goal of maximizing exploration and discovering diverse things, generating tasks that are challenging but achievable. This acts as a form of in-context novelty search, guiding the agent towards progressively more complex goals.
Skill Library: Voyager learns and stores successful behaviors as executable code (JavaScript programs using Mineflayer APIs). When a task proposed by the curriculum is successfully completed, the corresponding program is added to the skill library. Each skill is indexed by an embedding of its description (generated by GPT-3.5). When facing a new task, Voyager retrieves relevant skills from the library based on semantic similarity to aid in generating new code. This allows skills to be reused, composed into more complex behaviors, and mitigates catastrophic forgetting.
Iterative Prompting Mechanism: Since LLMs often fail to generate perfect code in one shot, Voyager uses an iterative refinement process. It prompts GPT-4 to generate code for the current task. This code is executed in Minecraft. Voyager then gathers feedback:
- Environment Feedback: Observations about the execution's outcome (e.g., "Cannot craft X, need Y more Z").
- Execution Errors: Errors from the JavaScript interpreter if the code is invalid.
- Self-Verification: Another instance of GPT-4 acts as a critic, assessing whether the task was successfully completed based on the agent's state and the task description. If not, it provides a critique suggesting improvements. This feedback is incorporated into the prompt for the next round of code generation, allowing GPT-4 to refine the program iteratively until the self-verification module confirms success or a maximum number of attempts is reached.

Implementation and Evaluation:

Voyager interacts with GPT-4 via blackbox API queries, requiring no model fine-tuning.
It operates within the MineDojo framework, using Mineflayer JavaScript APIs as its low-level controller.
Experiments compared Voyager against adapted versions of LLM agent techniques like ReAct, Reflexion, and AutoGPT in Minecraft.
Results: Voyager significantly outperformed baselines:
- Discovered 3.3x more unique items.
- Traversed 2.3x longer distances across diverse terrains.
- Unlocked Minecraft tech tree milestones (wood, stone, iron, diamond) significantly faster (up to 15.3x faster for wood). Voyager was the only agent to reach the diamond level.
- Demonstrated strong zero-shot generalization by successfully using its learned skill library to solve novel, unseen tasks in new worlds, whereas baselines failed. The skill library also improved the performance of AutoGPT when provided to it.
Ablation Studies: Confirmed the critical importance of each component. Removing the automatic curriculum, skill library, or self-verification significantly degraded performance. Using GPT-4 for code generation was substantially better than GPT-3.5.
Human Feedback: The paper also showed Voyager can build complex 3D structures (like a Nether Portal or a house) when augmented with human feedback, where humans act either as the critic or the curriculum provider.

Limitations:

Cost: High cost associated with GPT-4 API calls.
Inaccuracies: Occasional failures in code generation or self-verification.
Hallucinations: GPT-4 sometimes proposes impossible tasks (e.g., crafting non-existent items) or generates code with invalid assumptions (e.g., using wrong fuel) or non-existent API calls.

Conclusion:

Voyager represents a significant step towards creating generalist, embodied agents capable of lifelong learning in open-ended environments. It effectively leverages the capabilities of LLMs for curriculum generation, skill acquisition via code, and iterative self-improvement through environmental and self-generated feedback, all without requiring gradient-based training.

PDF Markdown

Tweets

https://twitter.com/anshuchimala/status/1921380400862986686

https://twitter.com/ElmoTheHokage/status/1847624620955799807

https://twitter.com/LiuZuxin/status/1860435067916750852

https://twitter.com/an_chomsky/status/1772102858151698917

https://twitter.com/KonradSzafer/status/1811795076395467055

https://twitter.com/Promptmethus/status/1845774197953294584

YouTube

Show All Videos

Voyager: An Open-Ended Embodied Agent with Large Language Models (2305.16291v2)

Summary

Related Papers

Tweets

YouTube