Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation (2506.17213v1)

Published 20 Jun 2025 in cs.CV, cs.AI, and cs.RO

Abstract: An ideal traffic simulator replicates the realistic long-term point-to-point trip that a self-driving system experiences during deployment. Prior models and benchmarks focus on closed-loop motion simulation for initial agents in a scene. This is problematic for long-term simulation. Agents enter and exit the scene as the ego vehicle enters new regions. We propose InfGen, a unified next-token prediction model that performs interleaved closed-loop motion simulation and scene generation. InfGen automatically switches between closed-loop motion simulation and scene generation mode. It enables stable long-term rollout simulation. InfGen performs at the state-of-the-art in short-term (9s) traffic simulation, and significantly outperforms all other methods in long-term (30s) simulation. The code and model of InfGen will be released at https://orangesodahub.github.io/InfGen

Summary

A Unified Model for Long-Term Traffic Simulation

This paper introduces an innovative framework aimed at enhancing the fidelity of traffic simulations for autonomous vehicle systems, focusing on long-term scenarios up to 30 seconds. Existing traffic simulators often excel in modeling static environments and ego vehicle dynamics but falter when it comes to simulating the complex interactions of multiple non-ego agents over extended periods. This research develops a new approach by integrating scene generation with closed-loop motion simulation into a unified next-token prediction model.

Key Findings and Contributions:

The paper makes significant strides in addressing the challenge of agent disappearance over long rollouts by proposing a model that dynamically manages the entry and exit of agents. This method is pivotal as it tackles the unrealistic emptying of traffic scenarios — a problem highlighted through comparisons with prior state-of-the-art (SOTA) models, such as those presented in \cite{nips24smart}.

The introduction of a unified autoregressive transformer for interleaved token prediction marks a notable improvement over existing approaches, which typically separate motion simulation from scene generation tasks. By leveraging a set of task-specific tokenizers, including motion, agent pose, and control tokens, the model seamlessly transitions between simulating motion and generating new scenes with stable long-term realism.

Performance evaluations demonstrate the model's superiority in maintaining realism over long-term simulations. The researchers show that their approach surpasses other methods in significant metrics, particularly in terms of both motion and scenario authenticity, when tested on the expanded rollout horizon. Furthermore, the paper provides thorough comparisons against established baselines, underscoring the advances in spatial and temporal consistency achieved by the proposed model.

Implications for Future Research and Applications:

The implications of this research extend far beyond the immediate field of traffic simulation. As the demand for robust autonomous vehicle systems continues to rise, the capability to perform stable and accurate long-term traffic predictions becomes increasingly critical. Integrating both motion prediction and spatial agent management into a single framework could serve as a blueprint for future applications in related domains, such as urban planning and dynamic map-based applications for self-driving cars.

Moreover, the paper's findings open avenues for enhancing interactive reinforcement learning techniques used in autonomous systems, advocating for further exploration into integrating real-time environment feedback to enrich agent interactions and scenario generation processes. The long-term goal would be creating a model capable of rolling out realistic, trip-level driving simulations, potentially transforming how autonomous driving systems are tested and validated.

In summary, this paper makes substantial progress toward more holistic and realistic traffic simulation for self-driving systems. While there is still considerable ground to cover before reaching fully realistic trip-level simulations, the unified model proposed here represents an essential advancement toward achieving that vision.

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation (2506.17213v1)

Summary

A Unified Model for Long-Term Traffic Simulation

Related Papers

GitHub