Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Word2World: Generating Stories and Worlds through Large Language Models (2405.06686v1)

Published 6 May 2024 in cs.CL and cs.AI

Abstract: LLMs have proven their worth across a diverse spectrum of disciplines. LLMs have shown great potential in Procedural Content Generation (PCG) as well, but directly generating a level through a pre-trained LLM is still challenging. This work introduces Word2World, a system that enables LLMs to procedurally design playable games through stories, without any task-specific fine-tuning. Word2World leverages the abilities of LLMs to create diverse content and extract information. Combining these abilities, LLMs can create a story for the game, design narrative, and place tiles in appropriate places to create coherent worlds and playable games. We test Word2World with different LLMs and perform a thorough ablation study to validate each step. We open-source the code at https://github.com/umair-nasir14/Word2World.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Bringing stories alive: Generating interactive fiction worlds. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, 3–9.
  2. The Ink Splotch Effect: A Case Study on ChatGPT as a Co-Creative Game Designer. arXiv preprint arXiv:2403.02454.
  3. Hierarchically composing level generators for the creation of complex structures. IEEE Transactions on Games.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  5. Computational creativity in a closed game system. In 2012 IEEE Conference on Computational Intelligence and Games (CIG), 296–303. IEEE.
  6. Genie: Generative Interactive Environments. arXiv preprint arXiv:2402.15391.
  7. Dungeons and dragons as a dialog challenge for artificial intelligence. arXiv preprint arXiv:2210.07109.
  8. The go transformer: natural language modeling for game play. In 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 23–26. IEEE.
  9. Multi-faceted evolution of simple arcade games. In 2011 IEEE Conference on Computational Intelligence and Games (CIG’11), 289–296. IEEE.
  10. ANGELINA-Coevolution in Automated Game Design. Computational Creativity, 228.
  11. Aesthetic considerations for automated platformer design. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 8, 124–129.
  12. de Wynter, A. 2024. Will GPT-4 Run DOOM? arXiv preprint arXiv:2403.05468.
  13. Generating interactive worlds with text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 1693–1700.
  14. Large Language Models and Games: A Survey and Roadmap. arXiv preprint arXiv:2402.18659.
  15. Toward supporting stories with procedurally generated game worlds. In 2011 IEEE Conference on Computational Intelligence and Games (CIG’11), 297–304. IEEE.
  16. Generating Games via LLMs: An Investigation with Video Game Description Language. arXiv preprint arXiv:2404.08706.
  17. Playing With Unicorns: AI Dungeon and Citizen NLP. DHQ: Digital Humanities Quarterly, 14(4).
  18. Replay-guided adversarial environment design. Advances in Neural Information Processing Systems, 34: 1884–1897.
  19. Computational game creativity. ICCC.
  20. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022: 500902.
  21. Deep learning for procedural content generation. Neural Computing and Applications, 33(1): 19–37.
  22. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9): 1–35.
  23. Automatic making of sokoban problems. In PRICAI’96: Topics in Artificial Intelligence: 4th Pacific Rim International Conference on Artificial Intelligence Cairns, Australia, August 26–30, 1996 Proceedings 4, 592–600. Springer.
  24. Augmentative topology agents for open-ended learning. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation, 671–674.
  25. Llmatic: Neural architecture search via large language models and quality-diversity optimization. arXiv preprint arXiv:2306.01102.
  26. Practical PCG through large language models. In 2023 IEEE Conference on Games (CoG), 1–4. IEEE.
  27. The chess transformer: Mastering play using generative language models. arXiv preprint arXiv:2008.04057.
  28. Semantic cosine similarity. In The 7th international student conference on advanced science and technology ICAST, volume 4, 1. University of Seoul South Korea.
  29. A generalist agent. arXiv preprint arXiv:2205.06175.
  30. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  31. Procedural content generation in games.
  32. Mariogpt: Open-ended text2level generation through large language models. Advances in Neural Information Processing Systems, 36.
  33. Super mario as a string: Platformer level generation via lstms. arXiv preprint arXiv:1603.00930.
  34. Procedural content generation via machine learning (PCGML). IEEE Transactions on Games, 10(3): 257–270.
  35. Reinforcement learning: An introduction. MIT press.
  36. Level generation through large language models. In Proceedings of the 18th International Conference on the Foundations of Digital Games, 1–8.
  37. What is procedural content generation? Mario on the borderline. In Proceedings of the 2nd international workshop on procedural content generation in games, 1–6.
  38. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games, 3(3): 172–186.
  39. Bootstrapping conditional gans for video game level generation. In 2020 IEEE Conference on Games (CoG), 41–48. IEEE.
  40. Learning to speak and act in a fantasy text adventure game. arXiv preprint arXiv:1903.03094.
  41. Attention is all you need. Advances in neural information processing systems, 30.
  42. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753.
  43. CALYPSO: LLMs as Dungeon Master’s Assistants. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 19, 380–390.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Muhammad U. Nasir (3 papers)
  2. Steven James (30 papers)
  3. Julian Togelius (154 papers)
Citations (5)

Summary

  • The paper introduces a novel method to generate playable game environments directly from narratives using a multi-step LLM process.
  • It employs a detailed pipeline for tile extraction and iterative refinement to ensure coherent and structurally sound game levels.
  • Experimental evaluations reveal that larger LLMs achieve superior novelty and coherence in procedurally generated game worlds.

Word2World: Turning Stories into Playable Game Levels Using LLMs

Understanding Word2World

Imagine being able to create an entire playable game level just from a story. That’s what the new system, Word2World, aims to achieve. It leverages the capabilities of LLMs to transform textual descriptions into coherent, playable game environments without needing task-specific fine-tuning. Word2World brings us closer to fully automating procedural content generation (PCG) in gaming, marking a significant step in the use of LLMs beyond text generation.

Breaking Down the Components

Procedural Content Generation with LLMs

The process of generating game levels in Word2World is divided into several steps:

  1. Story Creation: An LLM first generates a story, which forms the narrative foundation for the game.
  2. Information Extraction: The LLM extracts essential details from this story, such as character descriptions, tile information, goals, critical tiles, walkable tiles, and interactive tiles.
  3. World Generation: The system then proceeds to lay down the environment tiles in one step, followed by the placement of characters and important interactive tiles in another step.
  4. Algorithmic Refinements: Algorithmic checks ensure tiles are correctly placed and the map adheres to specific constraints (e.g., equal row lengths in the tile map).
  5. Feedback Loop: The process involves multiple rounds where evaluations from previous iterations are fed back to refine the world until it is coherent and playable.

This multi-step process helps ensure the generated world is not only diverse and rich in content but also structurally sound for gameplay.

Tile Selection Process

Choosing the right tiles is crucial to creating an engaging and coherent game world. Word2World uses a pre-defined dataset of tiles for the environment and characters. These tiles are described and labeled manually to optimize retrieval. DistilBERT, a smaller, faster version of BERT, is used to find the most relevant tiles via cosine similarity based on these descriptions.

LLM Agents as Game Testers

An interesting aspect of Word2World is the use of LLM agents as evaluators to simulate playing the game. These agents generate action sequences (like moving up, down, left, right, picking objects, and hitting enemies) to test if the generated levels are playable. Their ability is assessed by the rewards they receive for correctly performing tasks in the game environment.

Evaluation of Word2World

The robustness of Word2World is demonstrated through various evaluations:

  • LLM-based Evaluations: These assess the coherence of the game world with the narrative.
  • Conventional PCG Checks:
    • Playability checks using an AStar agent.
    • Path length measurements.
    • Novelty assessments based on differences from previously generated worlds.
    • Accuracy checks for the placement of character and important tiles.

Experimental Insights

Word2World was tested across multiple runs, revealing some interesting findings:

  • The method consistently generates coherent and playable worlds.
  • Ablation studies (where specific steps are removed) show that every component in Word2World's multi-step process crucially contributes to its performance. For instance, omitting goal extraction or important tile extraction significantly hampers the quality of generated levels.
  • Different LLMs were compared, with larger models (like GPT-4 and Claude-3) generally outperforming smaller variants in terms of novelty and coherence.

Broader Implications and Future Directions

Practical Implications

The practical implications of Word2World are vast for both the gaming industry and the research community. For the gaming industry, it offers a tool to rapidly prototype game levels based on narrative inputs, significantly reducing development time and cost. For researchers, Word2World provides diverse environments that can be used for various AI and machine learning experiments, especially in reinforcement learning.

Theoretical Implications

From a theoretical perspective, Word2World bridges a gap between narrative generation and computational game creativity. It provides a new framework to explore how stories can be translated into structured, interactive environments, potentially setting new benchmarks for narrative-driven game development.

Future Developments

There's a lot of potential for expanding Word2World:

  • Different Game Genres: Adapting the system for 2D platformers or 3D environments.
  • Open-World Games: Creating expansive, open-world environments based on storybooks or other narrative formats.
  • Reinforcement Learning: As Word2World can generate diverse and coherent environments, it could be a valuable tool for open-ended learning and other advanced AI experiments.

Word2World represents a significant step forward in using LLMs for procedural content generation, making it easier than ever to turn imaginative stories into interactive game worlds. As this technology develops, we can expect even more innovative solutions for creating rich, engaging virtual environments.