Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft (2404.15538v1)

Published 23 Apr 2024 in cs.GR, cs.AI, cs.CL, and cs.LG

Abstract: Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, such approaches cannot guarantee functionality, which is crucial for certain applications like game design. In this paper, we present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft. Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions. We find that DreamCraft produces more aligned in-game artifacts than a baseline that post-processes the output of an unconstrained NeRF. Thanks to the quantized representation of the environment, functional constraints can be integrated using specialized loss terms. We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types. DreamCraft inherits a high degree of expressivity and controllability from the NeRF, while still being able to incorporate functional constraints through domain-specific objectives.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (93)
  1. Assessing aesthetic criteria in the evolutionary dungeon designer. In Proceedings of the 13th International Conference on the Foundations of Digital Games. 1–4.
  2. World-gan: a generative model for minecraft worlds. In 2021 IEEE Conference on Games (CoG). IEEE, 1–8.
  3. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems 35 (2022), 24639–24654.
  4. Philip Bontrager and Julian Togelius. 2021. Learning to generate levels from nothing. In 2021 IEEE Conference on Games (CoG). IEEE, 1–8.
  5. Nathan Brewer. 2017. Computerized Dungeons and Randomly Generated Worlds: From Rogue to Minecraft [Scanning Our Past]. Proc. IEEE 105, 5 (2017), 970–977.
  6. MONstEr: A Deep Learning-Based System for the Automatic Generation of Gaming Assets. In Image Analysis and Processing. ICIAP 2022 Workshops: ICIAP International Workshops, Lecce, Italy, May 23–27, 2022, Revised Selected Papers, Part I. Springer, 280–290.
  7. Genie: Generative Interactive Environments. arXiv preprint arXiv:2402.15391 (2024).
  8. Alessandro Canossa and Gillian Smith. 2015. Towards a procedural evaluation technique: Metrics for level design. In The 10th International Conference on the Foundations of Digital Games. sn, 8.
  9. Tensorf: Tensorial radiance fields. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII. Springer, 333–350.
  10. Steve Dahlskog and Julian Togelius. 2014. Procedural content generation using patterns as objectives. In Applications of Evolutionary Computation: 17th European Conference, EvoApplications 2014, Granada, Spain, April 23-25, 2014, Revised Selected Papers 17. Springer, 325–336.
  11. Emu: Enhancing image generation models using photogenic needles in a haystack. arXiv preprint arXiv:2309.15807 (2023).
  12. Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems 33 (2020), 13049–13061.
  13. Robert Ota Dieterich. 2017. Using Proof-Of-Concept Feedback to Explore the Relationship Between Artists and Procedural Content Generation in Computer Game Development Tools. Ph. D. Dissertation.
  14. Learning controllable content generators. In 2021 IEEE Conference on Games (CoG). IEEE, 1–9.
  15. Illuminating diverse neural cellular automata for level generation. In Proceedings of the Genetic and Evolutionary Computation Conference. 68–76.
  16. Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv:2206.08853 (2022).
  17. Procedural Generation of Game Levels and Maps: A Review. In 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 050–055.
  18. Adversarial reinforcement learning for procedural content generation. In 2021 IEEE Conference on Games (CoG). IEEE, 1–8.
  19. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
  20. Procedural content generation through quality diversity. In 2019 IEEE Conference on Games (CoG). IEEE, 1–8.
  21. Evocraft: A new challenge for open-endedness. In Applications of Evolutionary Computation: 24th International Conference, EvoApplications 2021, Held as Part of EvoStar 2021, Virtual Event, April 7–9, 2021, Proceedings 24. Springer, 325–340.
  22. Mario level generation from mechanics using scene stitching. In 2020 IEEE Conference on Games (CoG). IEEE, 49–56.
  23. Maxim Gumin. 2016. Wave Function Collapse Algorithm. https://github.com/mxgmn/WaveFunctionCollapse
  24. Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis. In Proceedings of the Asian Conference on Computer Vision. 3757–3775.
  25. The minerl 2020 competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv:2101.11071 (2021).
  26. Visual procedural content generation with an artificial abstract artist. In Proceedings of ICCC computational creativity and games workshop.
  27. Constraint-Based PCGML Approaches. In Procedural Content Generation via Machine Learning: An Overview. Springer, 51–66.
  28. PCGML Process Overview. In Procedural Content Generation via Machine Learning: An Overview. Springer, 35–49.
  29. Procedural Content Generation Via Machine Learning: An Overview. Springer.
  30. Gancraft: Unsupervised 3d neural rendering of minecraft worlds. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14072–14082.
  31. Procedural content generation for games: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9, 1 (2013), 1–22.
  32. Categorical Reparametrization with Gumble-Softmax. In International Conference on Learning Representations (ICLR 2017). OpenReview. net.
  33. Prioritized level replay. In International Conference on Machine Learning. PMLR, 4940–4950.
  34. Learning Controllable 3D Level Generators. In Proceedings of the 17th International Conference on the Foundations of Digital Games. 1–9.
  35. The Malmo Platform for Artificial Intelligence Experimentation.. In Ijcai. 4246–4247.
  36. Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378 (2019).
  37. Illuminating generalization in deep reinforcement learning through procedural level generation. In NeurIPS Workshop on Deep Reinforcement Learning.
  38. Minerl diamond 2021 competition: Overview, results, and lessons learned. NeurIPS 2021 Competitions and Demonstrations Track (2022), 13–28.
  39. MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned. In Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track (Proceedings of Machine Learning Research, Vol. 176), Douwe Kiela, Marco Ciccone, and Barbara Caputo (Eds.). PMLR, 13–28. https://proceedings.mlr.press/v176/kanervisto22a.html
  40. Isaac Karth and Adam M Smith. 2019. Addressing the fundamental tension of PCGML with discriminative learning. In Proceedings of the 14th International Conference on the Foundations of Digital Games. 1–9.
  41. Pcgrl: Procedural content generation via reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 16. 95–101.
  42. The nethack learning environment. Advances in Neural Information Processing Systems 33 (2020), 7671–7684.
  43. Han-Hung Lee and Angel X Chang. 2022. Understanding pure clip guidance for voxel grid nerf models. arXiv preprint arXiv:2209.15172 (2022).
  44. Precomputing Player Movement in Platformers for Level Generation with Reachability Constraints.. In AIIDE Workshops.
  45. Designer modeling for personalized game content creation tools. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 9. 11–16.
  46. Adapting models of visual aesthetics for personalized content creation. IEEE Transactions on Computational Intelligence and AI in Games 4, 3 (2012), 213–228.
  47. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.
  48. Deep learning for procedural content generation. Neural Computing and Applications 33, 1 (2021), 19–37.
  49. Cyprezz LLC. [n. d.]. Planet Minecraft Community: Creative fansite for everything minecraft! https://www.planetminecraft.com/.
  50. Deep reinforcement learning for procedural content generation of 3d virtual environments. Journal of Computing and Information Science in Engineering 20, 5 (2020).
  51. Evolving Flying Machines in Minecraft Using Quality Diversity. arXiv preprint arXiv:2302.00782 (2023).
  52. Interactive Latent Variable Evolution for the Generation of Minecraft Structures. In Proceedings of the 18th International Conference on the Foundations of Digital Games. 1–8.
  53. The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 19. 107–115.
  54. Retrospective analysis of the 2019 MineRL competition on sample efficient reinforcement learning. In NeurIPS 2019 Competition and Demonstration Track. PMLR, 203–214.
  55. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision.
  56. CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 Conference Papers. 1–8.
  57. Controllable and coherent level generation: A two-pronged approach. In Experimental AI in games workshop.
  58. Rohit Nair. 2020. Using Raymarched shaders as environments in 3D video games. Drexel University.
  59. Rules and mechanics. Procedural Content Generation in Games (2016), 99–121.
  60. Evolving curricula with regret-based environment design. In International Conference on Machine Learning. PMLR, 17473–17498.
  61. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
  62. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  63. Sebastian Risi and Julian Togelius. 2020. Increasing generality in machine learning through procedural content generation. Nature Machine Intelligence 2, 8 (2020), 428–436.
  64. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695.
  65. Impressions of the GDMC AI Settlement Generation Challenge in Minecraft. In Proceedings of the 17th International Conference on the Foundations of Digital Games. 1–16.
  66. Generative design in minecraft (gdmc) settlement generation competition. In Proceedings of the 13th International Conference on the Foundations of Digital Games. 1–10.
  67. Minihack the planet: A sandbox for open-ended reinforcement learning research. arXiv preprint arXiv:2109.13202 (2021).
  68. Anurag Sarkar and Seth Cooper. 2020. Sequential segment-based level generation and blending using variational autoencoders. In Proceedings of the 15th International Conference on the Foundations of Digital Games. 1–9.
  69. Conditional level generation and game blending. arXiv preprint arXiv:2010.07735 (2020).
  70. Procedural content generation in games. (2016).
  71. Mixed-initiative content creation. Procedural content generation in games (2016), 195–214.
  72. Towards automatic personalized content generation for platform games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 6. 63–68.
  73. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792 (2022).
  74. Path of Destruction: Learning an Iterative Level Generator Using a Small Dataset. arXiv preprint arXiv:2202.10184 (2022).
  75. Ole Edvin Skjeltorp. 2022. 3D Neural Cellular Automata-Simulating morphogenesis: Shape, color and behavior of three-dimensional structures. Master’s thesis.
  76. Adam M Smith and Michael Mateas. 2011. Answer set programming for procedural content generation: A design space approach. IEEE Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 187–200.
  77. MarioGPT: Open-Ended Text2Level Generation through Large Language Models. arXiv:2302.05981 [cs.AI]
  78. Growing 3d artefacts and functional machines with neural cellular automata. arXiv preprint arXiv:2103.08737 (2021).
  79. Adam Summerville and Michael Mateas. 2015. Sampling hyrule: Multi-technique probabilistic level generation for action role playing games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 11. 63–67.
  80. Procedural content generation via machine learning (PCGML). IEEE Transactions on Games 10, 3 (2018), 257–270.
  81. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5459–5469.
  82. Human-Timescale Adaptation in an Open-Ended Task Space. arXiv preprint arXiv:2301.07608 (2023).
  83. Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808 (2021).
  84. Level Generation Through Large Language Models. In Proceedings of the 18th International Conference on the Foundations of Digital Games. 1–8.
  85. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 172–186.
  86. Bootstrapping conditional gans for video game level generation. In 2020 IEEE Conference on Games (CoG). IEEE, 41–48.
  87. Is Attention All NeRF Needs? arXiv preprint arXiv:2207.13298 (2022).
  88. F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories. arXiv preprint arXiv:2303.15951 (2023).
  89. Ryan Watkins. 2016. Procedural content generation for unity game development. Packt Publishing Ltd.
  90. Georgios N Yannakakis and Julian Togelius. 2011. Experience-driven procedural content generation. IEEE Transactions on Affective Computing 2, 3 (2011), 147–161.
  91. Cristopher Yates. 2021. The use of Poisson Disc Distribution and A* Pathfinding for Procedural Content Generation in Minecraft. Ph. D. Dissertation. Ph. D. Dissertation. Memorial University.
  92. Video game level repair via mixed integer linear programming. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 16. 151–158.
  93. Alexander Zook and Mark O Riedl. 2014. Generating and adapting game mechanics. In Proceedings of the 2014 Foundations of Digital Games Workshop on Procedural Content Generation in Games.
Citations (4)

Summary

  • The paper introduces DreamCraft, a novel method using quantized Neural Radiance Fields (NeRFs) to generate functional 3D Minecraft environments from text prompts.
  • DreamCraft employs a voxel grid representation and embeds functional constraints directly into the generation process using a soft-to-hard quantization technique.
  • Experimental results show that DreamCraft generates text-aligned and visually coherent 3D structures that adhere to specified functional rules within the Minecraft environment.

Insights into DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft

The paper "DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft" introduces a novel approach for creating functional 3D environments within the popular sandbox game Minecraft, driven by text prompts. The authors propose and validate a methodology that leverages quantized Neural Radiance Fields (NeRFs) to generate functional Minecraft artifacts that align more closely with text descriptions than results obtained from unconstrained NeRFs. This work stands out due to its focus on integrating both expressivity and functionality, addressing inherent challenges in both procedural content generation (PCG) and text-to-3D generative methods.

Methodology

The core innovation of DreamCraft lies in its use of a quantized NeRF, capable of incorporating domain-specific constraints into the environment generation process. The authors adopt a voxel grid to represent 3D structures, to which functional constraints such as block distribution and adjacency rules are applied. These constraints are embedded within the loss function during the training phase, ensuring the resulting structures adhere to both aesthetic descriptions from the text prompts and functional requirements typical of game environments.

DreamCraft operates by translating free-form text prompts into structured representations within Minecraft. This involves training the NeRF on quantized data, allowing it to handle discrete Minecraft block types while still preserving expressivity and alignment with the input descriptions. The method employs a soft-to-hard quantization technique for block densities, assisting with learning stability and optimizing the resulting structures for both visual quality and functionality.

Results

The paper provides quantitative evidence demonstrating DreamCraft's ability to generate text-aligned 3D structures with enhanced precision over a baseline NeRF approach. Evaluation metrics such as R-precision are used to ascertain the fidelity of the generated environments against reference captions. DreamCraft shows significant improvements in generating domain-relevant and visually coherent environments, particularly when prompts are contextually aligned with Minecraft's stylistic tendencies.

The experimental results further illustrate how DreamCraft allows for the incorporation of explicit functional constraints, such as adherence to block adjacency rules or specific spatial distributions of blocks, yielding in-game structures that are plausible and navigable. This capability underscores the model's potential advantages in applications related to game design and automated content creation.

Implications and Future Directions

DreamCraft's contributions lie in bridging the gap between high-level text-guided generation technologies and the specific requirements of video game asset generation, resulting in a system that combines the expressive power of language with the necessary practicalities of functional game design. It presents a robust framework for exploring how generative AI can assist designers in creating dynamic and adaptable content within voxel-based environments like Minecraft.

Future work could further explore reductions in computational demand, enabling real-time generation processes critical for interactive game design tools. Additionally, extending the model's capabilities to other game genres or platforms could broaden its applicability, supporting diverse domains where procedural generation is desirable. The integration of richer functional constraints and more nuanced visual rendering capabilities could further enhance the realism and utility of the generated environments.

In summary, DreamCraft exemplifies a successful integration of procedural content generation techniques and neural radiance fields, enriched by the flexibility of text-driven inputs and strengthened by functional guarantees. This advances the frontier of automated 3D content creation in virtual environments, opening pathways for fine-grained control over aesthetic and functional properties in automated game world generation.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com