Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches (2408.04567v1)

Published 8 Aug 2024 in cs.CV and cs.GR

Abstract: 3D Content Generation is at the heart of many computer graphics applications, including video gaming, film-making, virtual and augmented reality, etc. This paper proposes a novel deep-learning based approach for automatically generating interactive and playable 3D game scenes, all from the user's casual prompts such as a hand-drawn sketch. Sketch-based input offers a natural, and convenient way to convey the user's design intention in the content creation process. To circumvent the data-deficient challenge in learning (i.e. the lack of large training data of 3D scenes), our method leverages a pre-trained 2D denoising diffusion model to generate a 2D image of the scene as the conceptual guidance. In this process, we adopt the isometric projection mode to factor out unknown camera poses while obtaining the scene layout. From the generated isometric image, we use a pre-trained image understanding method to segment the image into meaningful parts, such as off-ground objects, trees, and buildings, and extract the 2D scene layout. These segments and layouts are subsequently fed into a procedural content generation (PCG) engine, such as a 3D video game engine like Unity or Unreal, to create the 3D scene. The resulting 3D scene can be seamlessly integrated into a game development environment and is readily playable. Extensive tests demonstrate that our method can efficiently generate high-quality and interactive 3D game scenes with layouts that closely follow the user's intention.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel pipeline that transforms hand-drawn sketches and text prompts into interactive 3D game scenes.
It leverages a modified ControlNet with Sketch-Aware Loss to convert sketches into 2D isometric images and uses diffusion-based inpainting to produce clean basemaps.
The approach further applies advanced 3D scene understanding and procedural generation techniques to reconstruct realistic game environments and accelerate content creation.

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

The paper "Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches" proposes a novel pipeline for generating 3D game scenes from user-provided sketches and text descriptions. This approach leverages cutting-edge techniques in generative AI, specifically diffusion models and procedural content generation.

Summary and Contributions

The primary contribution of the paper is the development of a system that enables the automatic creation of 3D game environments. This system addresses the prevalent challenges in current 3D content generation, particularly the lack of large-scale, high-quality 3D scene datasets suitable for training deep learning models. The key innovations of the paper include:

Sketch-guided 2D Isometric Image Generation:
- Utilization of a modified ControlNet model, enhanced with a Sketch-Aware Loss (SAL), to convert hand-drawn sketches into 2D isometric images.
- This step allows users to provide intuitive and simple sketches that the system can interpret and embellish according to the context given by text prompts.
- By strategically filtering and augmenting training sketches, the method ensures flexibility and robust scene understanding.
Deep Learning-based Basemap Inpainting:
- Introduction of a novel inpainting model fine-tuned using Step-Unrolled Denoising (SUD) diffusion techniques to generate clean, empty basemaps.
- The model is trained on a curated dataset comprising various sources, such as pure texture images and partially masked isometric images, ensuring that it generalizes well to different scene layouts.
3D Scene Understanding and Procedural Content Generation:
- Implementation of advanced visual scene understanding to extract terrain heightmaps, texture splatmaps, and poses of foreground objects.
- Leveraging tools like Depth-Anything and Segment-Anything, the system effectively converts isometric images into 3D models.
- The procedural generation module employs these extracted parameters to reconstruct interactive 3D scenes that can be seamlessly integrated into existing game engines like Unity.

Technical Implementation

Sketch-guided Isometric Image Generation

The method uses ControlNet, which facilitates precise control over scene layout through sketches. The incorporation of Sketch-Aware Loss emphasizes the regions indicated by user sketches, ensuring that the generated 2D scenes align well with user intentions.

Basemap Inpainting

The inpainting model is fine-tuned on pre-trained SDXL-Inpaint models, incorporating a novel loss function and step-unrolled denoising strategy to handle large occluded regions effectively. This enables the generation of high-quality basemaps devoid of foreground objects, which are crucial for accurate terrain modeling.

3D Scene Reconstruction

By reprojecting the generated isometric image into a bird’s eye view (BEV) format and using advanced segmentation techniques, the system extracts meaningful scene components. The procedural content generation relies on these components to place 3D assets correctly, ensuring the generated scene is both visually appealing and logically consistent.

Results

The paper presents extensive qualitative results demonstrating the system’s capability to generate diverse and complex 3D scenes from simple sketches and text prompts. Comparative inpainting results show significant improvements over existing state-of-the-art models, highlighting the effectiveness of the proposed inpainting method.

Implications and Future Work

Practically, the Sketch2Scene pipeline could transform game development by dramatically reducing the time and expertise required to create 3D game scenes. Theoretically, it contributes to the growing body of research addressing data scarcity in 3D scene generation by creatively leveraging 2D models.

Future developments in AI could further enhance this work by integrating more complex and varied input modalities, such as incorporating user gestures or voice commands for scene creation. Additionally, improvements in 3D asset retrieval and generation models would broaden the scope and quality of scenes generated by this pipeline.

In conclusion, the paper provides a comprehensive and technically robust solution for 3D scene generation from casual sketches, setting the stage for future advancements in AI-driven content creation.

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1821751227014627804

https://twitter.com/arXivGPT/status/1822345749549486141

https://twitter.com/ryo694/status/1822963420091519143

https://twitter.com/arXivGPT/status/1822708414461420024

https://twitter.com/arXivGPT/status/1823070867989106800

YouTube

Show All Videos