GameFactory: Creating New Games with Generative Interactive Videos (2501.08325v2)

Published 14 Jan 2025 in cs.CV

Abstract: Generative videos have the potential to revolutionize game development by autonomously creating new content. In this paper, we present GameFactory, a framework for action-controlled scene-generalizable game video generation. We first address the fundamental challenge of action controllability by introducing GF-Minecraft, a action-annotated game video dataset without human bias, and developing a action control module that enables precise control over both keyboard and mouse inputs. We further extend to support autoregressive generation for unlimited-length interactive videos. More importantly, GameFactory tackles the critical challenge of scene-generalizable action control, which most existing methods fail to address. To enable the creation of entirely new and diverse games beyond fixed styles and scenes, we leverage the open-domain generative priors from pre-trained video diffusion models. To bridge the domain gap between open-domain priors and small-scale game datasets, we propose a multi-phase training strategy with a domain adapter that decouples game style learning from action control. This decoupling ensures that action control learning is no longer bound to specific game styles, thereby achieving scene-generalizable action control. Experimental results demonstrate that GameFactory effectively generates open-domain action-controllable game videos, representing a significant step forward in AI-driven game generation. Our dataset and project page are publicly available at https://yujiwen.github.io/gamefactory/.

Summary

The paper presents the GameFactory framework that leverages pre-trained video diffusion models to generate interactive game content with scene generalization.
It introduces a multi-phase training strategy, including pre-training, LoRA tuning, and action control, to decouple game style from user-directed actions.
Evaluation on the GF-Minecraft dataset using metrics like Flow-MSE and FID demonstrates improved action control and video quality, promising scalable game development.

Generative Game Creation with Scene Generalization via GameFactory Framework

The paper presents a novel framework, GameFactory, which aims to innovate the field of game development by addressing significant limitations in generative video game creation. GameFactory leverages pre-existing, robust video diffusion models to enable the generation of novel game content through interactive videos, thus reducing the manual workload traditionally associated with game development. A pivotal advancement embodied by GameFactory is its focus on scene generalization, which has been an underexplored aspect in prior research.

Core Concepts and Methodology

GameFactory relies on pre-trained video models that are adept at handling open-domain videos. These models employ video diffusion techniques, which previously have been used in various domains, including video synthesis and physics simulation. The primary challenge addressed by GameFactory is bridging the domain gap between open-domain data and small-scale game datasets, thereby achieving action-controllable game video generation. The paper introduces a well-structured multi-phase training strategy aimed at decoupling the game style learning from action control. This ensures that while maintaining diverse open-domain video generation capabilities, the model can accurately implement user-directed actions.

Phase Breakdown:

Pre-training Phase: The model is initially pre-trained on open-domain video data without any game-specific information. This phase allows the model to achieve strong generalization capabilities inherent to its architecture.
LoRA Tuning Phase: Linear combination of the Low-Rank Adaptation (LoRA) technique is employed here to finetune the model using game-specific video data, providing the model with the ability to mimic game aesthetics without losing its pre-trained open-domain knowledge.
Action Control Module Training Phase: The focus here is solely on learning action control using a restricted set of game data. Importantly, this does not alter the visual style parameters, thus preventing style entanglement in learning actions.
Inference Phase: During inference, the LoRA fine-tuned parameters are detached, leveraging the action control module trained previously to produce diverse, interactively controllable game environments.

Dataset Construction and Model Evaluation

The researchers introduce GF-Minecraft, a finely annotated game video dataset drawn from the game Minecraft, utilized due to its extensive action space and customizable scenarios. This dataset aids in training models on high-diversity scenes without human selection bias in actions. Performance metrics for evaluating action controllability such as Flow-MSE (motion consistency to action input) and FID (Fréchet Inception Distance - measuring video quality) are discussed.

Implications and Future Directions

The implications of GameFactory suggest transformative potential across gaming and other interactive domains. The ability to generate highly diverse, new game scenarios through generalizable world models may redefine how games are developed, bringing forward practical benefits such as reduced development time and cost. Additionally, the model’s promise as a generalizable physics engine highlights potential applications in domains such as robotics and autonomous systems, where realistic simulation of dynamic environments is crucial.

Future developments might explore further decoupling strategies between style and control to refine model capabilities, extend the action space beyond current capabilities, and evaluate longer-sequence generation in real-time applications, thus enhancing utility in complex, interactive environments.

By presenting a structured, multi-phased approach to the problem of scene generalization in game generation, GameFactory stands as a prominent contribution, not only in visual generation research but in building foundations for truly scalable, AI-driven game engines and interactive simulations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/XihuiLiu/status/1881904118232723950

https://twitter.com/xinntao/status/1881888847350653245

https://twitter.com/_akhaliq/status/1881583226051006609

https://twitter.com/taziku_co/status/1881603399470477436

https://twitter.com/derzic_daniel/status/1881471029744459839

YouTube

Show All Videos