Dice Question Streamline Icon: https://streamlinehq.com

Long-horizon consistency in interactive generative world models

Establish mechanisms for sustaining long-horizon consistency in interactive generative systems, including action-conditioned video and scene-based world simulators, so that object identity, context, and coherence are preserved over extended real-time interaction.

Information Square Streamline Icon: https://streamlinehq.com

Background

Stage III advances world models from static generation to real-time, action-conditioned simulation. Despite this progress, models commonly lose context and hallucinate over extended play, revealing brittle long-term behavior.

The authors note a tension between explicit 3D scene generators (e.g., NeRFs, Gaussian Splatting) that offer spatial stability but require explicit modeling, and implicit frame-by-frame generators that are flexible yet prone to drift and forgetting. They explicitly state that sustaining long-horizon consistency remains unsolved.

References

Despite the leap to real-time interaction, sustaining long-horizon consistency remains unsolved.

From Masks to Worlds: A Hitchhiker's Guide to World Models (2510.20668 - Bai et al., 23 Oct 2025) in Section 5.3 (Challenges)