This paper introduces Voyager, an embodied agent powered by LLMs, specifically GPT-4, designed for open-ended exploration and lifelong learning within the Minecraft environment. The goal is to create an agent that can continuously explore, acquire diverse skills, and make novel discoveries without requiring human intervention or predefined goals, mimicking how humans learn and adapt in complex environments.
Voyager consists of three key components:
- Automatic Curriculum: This module dynamically proposes suitable tasks for the agent based on its current state (inventory, location, biome, etc.), exploration progress, and skill level. It prompts GPT-4 with the goal of maximizing exploration and discovering diverse things, generating tasks that are challenging but achievable. This acts as a form of in-context novelty search, guiding the agent towards progressively more complex goals.
- Skill Library: Voyager learns and stores successful behaviors as executable code (JavaScript programs using Mineflayer APIs). When a task proposed by the curriculum is successfully completed, the corresponding program is added to the skill library. Each skill is indexed by an embedding of its description (generated by GPT-3.5). When facing a new task, Voyager retrieves relevant skills from the library based on semantic similarity to aid in generating new code. This allows skills to be reused, composed into more complex behaviors, and mitigates catastrophic forgetting.
- Iterative Prompting Mechanism: Since LLMs often fail to generate perfect code in one shot, Voyager uses an iterative refinement process. It prompts GPT-4 to generate code for the current task. This code is executed in Minecraft. Voyager then gathers feedback:
- Environment Feedback: Observations about the execution's outcome (e.g., "Cannot craft X, need Y more Z").
- Execution Errors: Errors from the JavaScript interpreter if the code is invalid.
- Self-Verification: Another instance of GPT-4 acts as a critic, assessing whether the task was successfully completed based on the agent's state and the task description. If not, it provides a critique suggesting improvements. This feedback is incorporated into the prompt for the next round of code generation, allowing GPT-4 to refine the program iteratively until the self-verification module confirms success or a maximum number of attempts is reached.
Implementation and Evaluation:
- Voyager interacts with GPT-4 via blackbox API queries, requiring no model fine-tuning.
- It operates within the MineDojo framework, using Mineflayer JavaScript APIs as its low-level controller.
- Experiments compared Voyager against adapted versions of LLM agent techniques like ReAct, Reflexion, and AutoGPT in Minecraft.
- Results: Voyager significantly outperformed baselines:
- Discovered 3.3x more unique items.
- Traversed 2.3x longer distances across diverse terrains.
- Unlocked Minecraft tech tree milestones (wood, stone, iron, diamond) significantly faster (up to 15.3x faster for wood). Voyager was the only agent to reach the diamond level.
- Demonstrated strong zero-shot generalization by successfully using its learned skill library to solve novel, unseen tasks in new worlds, whereas baselines failed. The skill library also improved the performance of AutoGPT when provided to it.
- Ablation Studies: Confirmed the critical importance of each component. Removing the automatic curriculum, skill library, or self-verification significantly degraded performance. Using GPT-4 for code generation was substantially better than GPT-3.5.
- Human Feedback: The paper also showed Voyager can build complex 3D structures (like a Nether Portal or a house) when augmented with human feedback, where humans act either as the critic or the curriculum provider.
Limitations:
- Cost: High cost associated with GPT-4 API calls.
- Inaccuracies: Occasional failures in code generation or self-verification.
- Hallucinations: GPT-4 sometimes proposes impossible tasks (e.g., crafting non-existent items) or generates code with invalid assumptions (e.g., using wrong fuel) or non-existent API calls.
Conclusion:
Voyager represents a significant step towards creating generalist, embodied agents capable of lifelong learning in open-ended environments. It effectively leverages the capabilities of LLMs for curriculum generation, skill acquisition via code, and iterative self-improvement through environmental and self-generated feedback, all without requiring gradient-based training.