Extending reinforcement learning to distilled autoregressive video models

Develop reinforcement learning techniques that can be effectively applied to highly efficient distilled autoregressive video models for streaming video generation alignment, rather than only to heavy pre-distilled teacher models.

Background

Prior work on aligning diffusion and flow models with reinforcement learning often relies on reverse-process likelihood estimation and heavy models, which is incompatible with efficient streaming architectures due to computational and memory costs. DiffusionNFT introduced a forward-process, solver-agnostic approach, and WorldCompass adapted it to autoregressive world models but focused on heavy pre-distilled teacher models.

In this context, the authors explicitly note that applying reinforcement learning to highly efficient distilled autoregressive video models remains an unresolved challenge, motivating the development of methods that maintain the efficiency benefits of these models while enabling online preference alignment.

References

Extending RL to highly efficient distilled AR video models remains an open problem.

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models  (2603.17051 - Zhang et al., 17 Mar 2026) in Section 2.3, Related Work — Reinforcement Learning for Generative Models