Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

169 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

45 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

68 58 10 1

Will GPT-4 Run DOOM? (2403.05468v1)

Published 8 Mar 2024 in cs.CL, cs.AI, and cs.CV

Abstract: We show that GPT-4's reasoning and planning capabilities extend to the 1993 first-person shooter Doom. This LLM is able to run and play the game with only a few instructions, plus a textual description--generated by the model itself from screenshots--about the state of the game being observed. We find that GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. More complex prompting strategies involving multiple model calls provide better results. While further work is required to enable the LLM to play the game as well as its classical, reinforcement learning-based counterparts, we note that GPT-4 required no training, leaning instead on its own reasoning and observational capabilities. We hope our work pushes the boundaries on intelligent, LLM-based agents in video games. We conclude by discussing the ethical implications of our work.

References (51)

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates GPT-4's ability to play Doom by interpreting visual game data into textual descriptions without additional training.
The paper employs a two-component setup with varied prompting strategies to assess strengths in navigation and combat while revealing memory and planning limitations.
The paper highlights the potential for LLM applications in gaming and simulations, emphasizing the need for enhanced reasoning and long-term strategic planning.

Investigating the Capabilities of GPT-4 in Playing Doom

Introduction

In a novel approach to exploring the planning and reasoning capabilities of LLMs, a paper has demonstrated the ability of GPT-4 to engage with and play the 1993 first-person shooter game, Doom. This exploration is driven by the aim to understand the extent to which LLMs, specifically GPT-4, can process complex environments and exhibit decision-making skills in a gaming context. Unlike traditional AI agents designed for gaming, which often rely on extensive training or fine-tuning on specific tasks, GPT-4 requires no additional training to interpret game dynamics and make strategic decisions based on textual descriptions generated from game screenshots.

Methodology

The research leverages a two-component setup consisting of a Vision component, which processes screenshots from Doom and provides textual descriptions of the game state, and an Agent model that decides on the actions to take based on these descriptions. The system is further enhanced with a Planner for generating a fine-grained plan of action and Experts for offering specialized advice, thereby creating a more sophisticated prompting strategy for GPT-4 to navigate the game. The game itself is interfaced through a Python binding of the original Doom engine, allowing seamless integration with the GPT-4 models.

Multiple prompting strategies were employed to assess GPT-4's gameplay performance, ranging from a naïve approach with minimal instruction to more complex methods involving walkthroughs and k-level planning. By adjusting these strategies, the paper aims to dissect the planning and reasoning intricacies of LLMs in a dynamic gaming environment.

Results

The findings reveal GPT-4's capability to play Doom at a basic level, including navigating environments, engaging enemies, and managing game resources. More intricate prompting strategies, particularly those involving multiple calls to GPT-4 for planning and advice, yielded better gameplay results. However, the LLM exhibited limitations in memory recall and the depth of reasoning, impacting its ability to perform long-term strategic planning.

Discussion

The paper underscores the potential of LLMs to process complex environments and make informed decisions without explicit training on the specific task. The success of GPT-4 in navigating the game environment of Doom suggests a promising avenue for developing intelligent agents capable of tackling a wide range of problem-solving and planning tasks. The work also sheds light on the challenges faced by LLMs in terms of memory retention and reasoning depth, pointing to areas for future improvement.

Implications for AI Development

This research contributes to a deeper understanding of the capabilities of LLMs in unconventional applications beyond text processing. The ability of GPT-4 to interact with and make decisions in a video game environment opens up new possibilities for employing LLMs in game testing, simulation-based learning, and possibly even in developing non-player characters (NPCs) in games. Moreover, the paper highlights the need for further exploration into enhancing the memory and reasoning capabilities of LLMs, which could lead to more sophisticated AI agents capable of complex decision-making and problem-solving.

Ethical Considerations

The paper briefly discusses the ethical implications of employing LLMs in gaming contexts, especially in scenarios that might simulate real-world activities. As the technology advances, it's crucial to navigate the development and application of LLMs responsibly, ensuring they contribute positively to society and do not inadvertently facilitate harmful behaviors.

Conclusion

This investigation into GPT-4's gameplay capabilities in Doom is a stepping stone towards understanding the potential and limitations of LLMs in dynamic and complex environments. While GPT-4 exhibits impressive planning and decision-making abilities, its performance underlines the necessity for advancements in memory and reasoning faculties. Future research in this domain can pave the way for more versatile and intelligent AI agents, expanding the horizons of AI application in gaming and beyond.

PDF Markdown

Tweets

https://twitter.com/ducha_aiki/status/1767256586047631790

https://twitter.com/bighungrypigeon/status/1767228071906873372

https://twitter.com/kfountou/status/1767377818634109107

https://twitter.com/masafumi/status/1767448455981732016

https://twitter.com/philpax_/status/1767028673893511344

https://twitter.com/gm8xx8/status/1767003409276670422

YouTube

Show All Videos

HackerNews

Will GPT-4 Run Doom? (5 points, 2 comments)
GPT-4 can run and play DOOM (5 points, 0 comments)

[R] [2403.05468] Will GPT-4 Run DOOM? (58 points, 19 comments)