ARCON: Advancing Auto-Regressive Continuation for Driving Videos (2412.03758v3)
Abstract: Recent advancements in auto-regressive LLMs have led to their application in video generation. This paper explores the use of Large Vision Models (LVMs) for video continuation, a task essential for building world models and predicting future frames. We introduce ARCON, a scheme that alternates between generating semantic and RGB tokens, allowing the LVM to explicitly learn high-level structural video information. We find high consistency in the RGB images and semantic maps generated without special design. Moreover, we employ an optical flow-based texture stitching method to enhance visual quality. Experiments in autonomous driving scenarios show that our model can consistently generate long videos.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.