Benefits and mechanisms of unifying world modeling and planning

Determine whether unifying world modeling and trajectory planning in a single autoregressive model that generates interleaved image and action token sequences via next-token prediction provides measurable benefits for autonomous driving, and, if so, elucidate how the learned world model facilitates trajectory planning within this unified framework.

Background

Prior autonomous driving world models often operate in a decoupled manner, with world modeling focused on predicting future states and a separate policy model handling planning. Recent works have begun integrating world modeling and planning into a unified autoregressive model that interleaves image and action tokens and predicts via next-token modeling.

However, even in these unified architectures, video generation and action prediction are typically conducted independently, and it remains unclear whether the architectural unification itself yields concrete planning benefits or a synergistic mechanism that improves decision-making. The paper introduces the Policy World Model (PWM) to address this gap by explicitly leveraging future state forecasting to aid planning, but the general question about the benefits and mechanisms of unification is explicitly stated as unknown.

References

It is still unknown whether and how this unification can further benefit autonomous driving.

— From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction (2510.19654 - Zhao et al., 22 Oct 2025) in Section 1 (Introduction), page 2

Benefits and mechanisms of unifying world modeling and planning

Sponsor

Background

References

Related Problems