Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory (2305.17144v2)

Published 25 May 2023 in cs.AI, cs.CL, cs.CV, and cs.LG

Abstract: The captivating realm of Minecraft has attracted substantial research interest in recent years, serving as a rich platform for developing intelligent agents capable of functioning in open-world environments. However, the current research landscape predominantly focuses on specific objectives, such as the popular "ObtainDiamond" task, and has not yet shown effective generalization to a broader spectrum of tasks. Furthermore, the current leading success rate for the "ObtainDiamond" task stands at around 20%, highlighting the limitations of Reinforcement Learning (RL) based controllers used in existing methods. To tackle these challenges, we introduce Ghost in the Minecraft (GITM), a novel framework integrates LLMs with text-based knowledge and memory, aiming to create Generally Capable Agents (GCAs) in Minecraft. These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions. We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute. The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate on the "ObtainDiamond" task, demonstrating superior robustness compared to traditional RL-based controllers. Notably, our agent is the first to procure all items in the Minecraft Overworld technology tree, demonstrating its extensive capabilities. GITM does not need any GPU for training, but a single CPU node with 32 CPU cores is enough. This research shows the potential of LLMs in developing capable agents for handling long-horizon, complex tasks and adapting to uncertainties in open-world environments. See the project website at https://github.com/OpenGVLab/GITM.

PDF Abstract

Overview of "Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via LLMs with Text-based Knowledge and Memory"

This paper introduces "Ghost in the Minecraft" (GITM), a framework integrating LLMs to develop Generally Capable Agents (GCAs) for navigating complex open-world environments like Minecraft. Traditional approaches have struggled with generalization beyond specific tasks such as the "ObtainDiamond" challenge. In contrast, GITM leverages the reasoning capabilities of LLMs coupled with structured actions and text-based memory to achieve high adaptability and success across a broad spectrum of tasks.

Key Contributions and Methodology

The GITM framework is structured around three core components:

LLM Decomposer: This module recursively breaks down overarching goals into smaller sub-goals, facilitating task management. By accessing external text-based knowledge, this component provides agents with necessary contextual information specific to Minecraft’s diverse environments and tasks.
LLM Planner: Responsible for generating action plans, this component uses structured actions framed in an abstract format, avoiding low-level operations typical of RL agents. It employs feedback loops to revise plans dynamically and utilizes a textual memory system to store successful strategies for future reference.
LLM Interface: This connects structured actions to keyboard and mouse operations within the game, translating high-level plans into executable game actions. This abstraction allows for a more nuanced understanding and decision-making process than direct RL models.

Results and Implications

The proposed method demonstrates a significant improvement in accomplishing the "ObtainDiamond" task, achieving a +47.5% increase in success rate compared to traditional RL approaches. Notably, GITM is the first to complete the entire technology tree within Minecraft's Overworld, unlocking all items without requiring extensive GPU resources, a task that traditional methods have not accomplished.

These results highlight the potential for LLMs to revolutionize game AI and autonomous agent development. GITM's architecture, with its text-based knowledge integration, offers promising directions for developing agents capable of tackling complex, long-term tasks with high efficiency and minimal computational cost.

Future Directions

The integration of LLMs with abstract interfaces and memory systems could extend beyond Minecraft to other domains requiring adaptive intelligence. Future work might explore refining these agents for real-time applications in dynamic environments or integrating multi-modal sensory inputs to enhance interaction fidelity. Additionally, the scalability of such systems to learn and generalize across various games or real-world simulations remains an interesting avenue for exploration.

Conclusion

GITM represents a pivotal shift toward using LLMs to enhance the capacity and efficiency of AI agents in open-world environments. By leveraging structured abstractions and knowledge-driven planning, it sets a foundation for building robust, general-purpose AI systems, significantly advancing the field of autonomous agent research.

PDF Markdown Bookmark Chat (Pro)

Authors (13)

Xizhou Zhu (73 papers)
Yuntao Chen (37 papers)
Hao Tian (146 papers)
Chenxin Tao (11 papers)
Weijie Su (37 papers)
Chenyu Yang (20 papers)
Gao Huang (178 papers)
Bin Li (514 papers)
Lewei Lu (55 papers)
Xiaogang Wang (230 papers)
Yu Qiao (563 papers)
Zhaoxiang Zhang (161 papers)
Jifeng Dai (131 papers)

Citations (181)

View on Semantic Scholar

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory (2305.17144v2)

Overview of "Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via LLMs with Text-based Knowledge and Memory"

Key Contributions and Methodology

Results and Implications

Future Directions

Conclusion

Related Papers

GitHub

YouTube