RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation (2505.03344v1)

Published 6 May 2025 in cs.RO and cs.LG

Abstract: Achieving both realism and controllability in interactive closed-loop traffic simulation remains a key challenge in autonomous driving. Data-driven simulation methods reproduce realistic trajectories but suffer from covariate shift in closed-loop deployment, compounded by simplified dynamics models that further reduce reliability. Conversely, physics-based simulation methods enhance reliable and controllable closed-loop interactions but often lack expert demonstrations, compromising realism. To address these challenges, we introduce a dual-stage AV-centered simulation framework that conducts open-loop imitation learning pre-training in a data-driven simulator to capture trajectory-level realism and multimodality, followed by closed-loop reinforcement learning fine-tuning in a physics-based simulator to enhance controllability and mitigate covariate shift. In the fine-tuning stage, we propose RIFT, a simple yet effective closed-loop RL fine-tuning strategy that preserves the trajectory-level multimodality through a GRPO-style group-relative advantage formulation, while enhancing controllability and training stability by replacing KL regularization with the dual-clip mechanism. Extensive experiments demonstrate that RIFT significantly improves the realism and controllability of generated traffic scenarios, providing a robust platform for evaluating autonomous vehicle performance in diverse and interactive scenarios.

Authors (4)

Keyu Chen (76 papers)
Wenchao Sun (8 papers)
Hao Cheng (190 papers)
Sifa Zheng (17 papers)

Summary

RIFT: Closed-Loop Reinforcement Learning Fine-Tuning for Traffic Simulation

The paper "RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation" presents a dual-stage simulation framework designed to improve the realism and controllability of traffic simulations, a crucial component in the development and evaluation of autonomous driving systems. One of the fundamental challenges in traffic simulation is achieving both realistic behavior and controllable scenarios in interactive closed-loop environments. This paper bridges the gap between data-driven and physics-based simulation approaches, providing a robust solution for autonomous vehicle (AV) performance evaluation.

Methodological Insights

The authors introduce a two-phase framework. Initially, Open-loop Imitation Learning (IL) is conducted in a data-driven simulator where expert demonstrations are used to train models to replicate trajectory-level realism and multimodality. This stage leverages real-world driving data to capture diverse behavioral patterns essential for realistic simulation outputs.

Following the pre-training phase, Closed-loop Reinforcement Learning (RL) fine-tuning is applied within a physics-based simulator. This addresses the covariate shift problem—discrepancies between the training environment and real-world deployment scenarios that often lead to degraded performance when models trained in open-loop conditions are applied to closed-loop interactions. The RL fine-tuning enhances controllability by preserving the trajectory-level multimodality and improving interaction stability. Noteworthy is the introduction of the RIFT method, which bases its fine-tuning process on a GRPO-style group-relative advantage formulation. This preserves behavioral diversity by evaluating all candidate trajectories rather than just optimizing for the best-performing trajectory.

Numerical Results and Claims

The numerical experiments within the paper present compelling evidence regarding the efficacy of RIFT. Extensive tests demonstrate that this dual-stage approach significantly enhances the realism and controllability of simulated traffic scenarios, laying a solid foundation for credible AV performance assessments. Key metrics used for validation include infraction rates like scenario collision per kilometer (CPK) and off-road rates (ORR), which collectively measure safety and driving progress. Additionally, realism is quantified via the Shapiro-Wilk test for assessing the normality of speed and acceleration distributions, alongside the Wasserstein Distance for deviation analysis from target speeds.

Implications and Future Directions

The dual-stage framework proposed offers substantial advancements in autonomous driving simulation methodologies, promising improvements in the efficacy of AV evaluation scenarios and supporting more rigorous development pipelines. By integrating imitation learning with reinforcement mechanisms, the framework leverages strengths specific to both data-driven and physics-based simulation paradigms, thus enhancing the simulation's fidelity and reliability. This innovative alignment enables researchers to systematically explore complex AV behavior in varied and interactive environments.

Looking ahead, the authors identify the need for continuous improvement of simulation fidelity, particularly addressing inaccuracies in long-term behavior modeling attributed to current reward estimation techniques. Future exploration could focus on refining state-wise reward models to better quantify trajectory-level comfort and facilitate smoother cross-modal transitions. Moreover, expanding the application scope to end-to-end training scenarios would further reduce the sim-to-real gap, optimizing AV deployment in real-world conditions.

In conclusion, this paper contributes a valuable perspective on enhancing traffic simulation techniques for autonomous driving by merging imitation and reinforcement learning in a structured simulation framework. The advancements reflected in RIFT serve as a beacon for future research directions in this domain.

Related Papers

Find Related Papers

YouTube

Show All Videos