Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating Games via LLMs: An Investigation with Video Game Description Language (2404.08706v1)

Published 11 Apr 2024 in cs.AI

Abstract: Recently, the emergence of LLMs has unlocked new opportunities for procedural content generation. However, recent attempts mainly focus on level generation for specific games with defined game rules such as Super Mario Bros. and Zelda. This paper investigates the game generation via LLMs. Based on video game description language, this paper proposes an LLM-based framework to generate game rules and levels simultaneously. Experiments demonstrate how the framework works with prompts considering different combinations of context. Our findings extend the current applications of LLMs and offer new insights for generating new games in the area of procedural content generation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Chengpeng Hu (11 papers)
  2. Yunlong Zhao (20 papers)
  3. Jialin Liu (97 papers)
Citations (5)

Summary

This paper investigates the capability of LLMs to generate both game rules and levels simultaneously. Unlike previous work that primarily focused on level generation for fixed game rules, this research proposes a framework called LLMGG (Generating Games via LLMs) that leverages Video Game Description Language (VGDL) as a structured representation for both aspects of game design.

The core of the LLMGG framework is an LLM that receives a text-based prompt describing the desired game. The LLM is expected to output the game's rules and levels in VGDL format. This VGDL output can then be parsed by a compatible engine, such as GVGAI gym, to create a playable game instance. The framework is designed to be general and can potentially interact iteratively with LLMs for refinement or be used with different LLM backbones.

Video Game Description Language (VGDL) is chosen as the representation language due to its human-readable yet machine-parsable nature. A VGDL game description typically includes four main components:

  • SpriteSet: Defines the types of objects (sprites) that exist in the game and their properties.
  • LevelMapping: Maps characters used in the level text file to one or more sprites defined in the SpriteSet.
  • InteractionSet: Defines what happens when two different types of sprites collide or interact.
  • TerminationSet: Defines the conditions under which the game ends (win, lose, draw).

The paper explores the impact of prompt design on the LLM's ability to generate correct VGDL. Prompts consist of a basic instruction (requesting a VGDL game and level) and optional context. The context can include:

  • Level notation mapping (e.g., 'W' for wall).
  • VGDL grammar descriptions (Base rules, Type Constraints C1C_1 and C2C_2 specifying allowed sprite classes, interaction methods like killSprite or removeSprite, and termination classes).
  • Examples of complete VGDL games.

Experiments were conducted using GPT-3.5, GPT-4, and Gemma 7B with seven different prompt variations, combining these context elements. Each prompt was tested over 10 trials for generating a simple Maze game.

To evaluate the generated output, the paper defines rule-based text validation metrics:

  • Parsable: The VGDL syntax must be valid and recognizable by a VGDL engine.
  • Logical: The generated VGDL must define all mandatory components (SpriteSet, LevelMapping, InteractionSet, TerminationSet) and ensure basic game logic completeness (e.g., defining interactions for avatar-wall and avatar-goal, having a win condition).
  • Mappable: Characters used in the level must have correct mappings to sprites defined in the rules, and essential sprites must be present in the level.

The experimental results highlight the critical role of context. Prompts without sufficient context struggled to generate valid VGDL. Adding VGDL grammar and examples significantly improved the parsability and logical correctness of the generated output, especially for GPT-4. Gemma 7B, in contrast, consistently failed to produce parsable VGDL in most trials.

A key finding relates to LLM hallucination, particularly concerning game logic. Even when generating parsable VGDL, LLMs sometimes created illogical rules. For instance, using avatar goal > killSprite in the InteractionSet, which in VGDL means the avatar is removed upon collision with the goal, was often misinterpreted by LLMs as the goal being removed. This demonstrates a mismatch between the LLM's natural language understanding of word order and the specific syntax conventions of VGDL. The paper found that aligning the VGDL syntax with the LLM's likely understanding, such as using goal avatar > killSprite or introducing a custom interaction like removeSprite where the second sprite (goal) is removed (avatar goal > removeSprite), could mitigate this hallucination. Prompts incorporating this syntactic alignment (P5 and P7) showed significantly better results in producing logically correct and playable games.

GPT-4 with the most comprehensive context (P7), which included level notation, full grammar description, the proposed removeSprite interaction method constraint, and a VGDL example, achieved 100% success rate in generating games that were Parsable, Logical, Mappable, and ultimately Correct (playable with expected behavior) in all 10 trials. Lower context prompts or less capable LLMs resulted in a higher frequency of various errors, including syntax errors, missing components, illogical interactions, and mapping issues (see Appendix Table 3 for error breakdown).

The paper concludes that LLMs hold significant potential for generating game rules and levels simultaneously using VGDL, enabling non-experts to prototype games via natural language prompts. However, it also emphasizes that hallucination regarding specific domain syntax (like VGDL interaction semantics) is a limitation. Providing rich context and potentially adapting the domain language syntax to better align with LLM priors can improve performance. Despite these advancements, human intervention is still necessary, especially for generating more complex and diverse games, to correct errors and guide the generation process. Future work could explore extending this approach to 3D game generation.

X Twitter Logo Streamline Icon: https://streamlinehq.com