- The paper presents an automated pipeline that transforms detailed real-world crash reports into high-fidelity CARLA simulation scenarios.
- It employs topology-preserving map reconstruction with OpenStreetMap data and leverages LLM-based inference to accurately estimate pre-crash vehicle states.
- The resulting benchmark, featuring 52 curated scenarios, enhances AV evaluation by replicating diverse collision geometries and complex maneuver patterns.
TRACE: Topology-Aware Reconstruction of Accidents in CARLA for AV Evaluation
Overview
The paper "TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation" (2604.22068) presents an automated pipeline and benchmark to translate real-world automotive accident data into high-fidelity simulation scenarios for autonomous vehicle (AV) testing. TRACE addresses the core problem of limited realism and insufficient coverage in current AV evaluation suites, which typically rely on synthetic or oversimplified scenarios that do not capture the topological and behavioral complexity of real collisions. By reconstructing actual crash scenes with topology-preserving map generation and employing LLMs for inferring pre-crash states, TRACE facilitates robust evaluation of AV decision-making under safety-critical, long-tail conditions.
Current AV simulation benchmarks are largely divided among adversarial scenario generation, knowledge-based rule synthesis, and data-driven methods. The last—especially relevant for robust validation—faces a significant limitation: standard driving datasets systematically under-represent rare and high-risk scenarios due to their naturally skewed distributions.
Earlier frameworks (e.g., AC3R, SoVAR, CrashAgent, SAFE) attempt to exploit crash report datasets but either apply abstracted road representations or lack open-source and reproducible pipelines for generating simulation-ready, site-specific test cases. Existing approaches also rarely preserve the topological fidelity of the physical site or maintain semantic consistency in vehicle placement and maneuver reconstruction.
TRACE stands out by integrating OpenStreetMap (OSM) for map preservation and leveraging LLMs for state inference, producing a diverse, verifiable benchmark of accident-derived scenarios with fine detail on both topology and kinematics.
TRACE Pipeline Architecture
The TRACE architecture is modular, integrating information extraction, map reconstruction, and scenario synthesis in sequential stages. This architecture ensures that reconstructed scenarios reflect real-world geometry, lawful and unlawful driving behavior as reported, and contextually accurate sequencing of accident events.
Figure 1: Pipeline architecture connects extraction, map and scene reconstruction to automated CARLA simulation, utilizing LLM-driven state estimation.
TRACE's Extractor parses NHTSA-reported crashes, which are XML-encoded and include comprehensive details (location, conditions, involved vehicles, narratives). The extractor enforces completeness constraints to ensure that only reports with valid geocoordinates, topology, and basic trajectory information are considered, filtering out records that would result in ambiguous or unfaithful reconstructions.
Map Reconstruction
Map fidelity is achieved through automated retrieval and conversion of OpenStreetMap data to the OpenDRIVE format required by CARLA. Traceable steps include coordinate projection, road network pruning, and crucial topology-preserving transformations. The process also introduces corrections for CARLA's limitations—such as disallowance of certain maneuvers—to accommodate the replication of all crash types, including those involving wrong-way driving or lane violations. However, vertical geometry (overpasses, tunnels) is unsupported, restricting the benchmark to planar segments.
Both intermediate and final reconstructions are validated by geometric comparison to the original OSM extract and by ensuring the reported crash coordinates fall within the reconstructed drivable region.

Figure 2: Real-world road geometry (left) faithfully mapped and reconstructed in CARLA by TRACE (right), yielding scenario-ready environments congruent with accident sites.
Scenario and Trajectory Reconstruction
A distinguishing technical element is the Scene Reconstructor, which combines map and report data to infer precise, simulation-ready vehicle initial states and trajectories. State inference is achieved by integrating LLM-based reasoning (with LLM prompts that include location, topological context, impact data, and legal constraints) with deterministic analytical checks. The pipeline iteratively refines LLM outputs, enforcing constraints such as proper vehicle orientation, plausible backward-propagated trajectory, and lane membership.
The Launcher module then transforms these state estimates into executable trajectory files, bypassing CARLA’s standard global routing to permit direct replay of "illegal" real-world maneuvers described in reports.
Simulation-phase validation applies multiple thresholds for physical and semantic plausibility: the reconstructed crash location must be within 5 meters of the reported point, impact orientations must match report data within specified tolerances, and pre-crash maneuver classes ("turning left", "going straight", etc.) must be preserved.
Benchmark Characterization
The TRACE benchmark encapsulates 52 curated, topology-faithful accident scenarios, drawn from 100 sampled NHTSA reports and covering a range of collision types, road geometries, and trajectory patterns. Scenario selection was filtered by pipeline capability, data completeness, and simulation verifiability. Excluded cases were predominantly due to incompatibilities with vertical topologies or insufficient report specifics.





Figure 3: Sample benchmark scenarios span T-intersections, 4-ways, angles, curves, and varying dynamic collision types, underscoring TRACE’s representational diversity.
Coverage analysis (FARS categories) reveals:
- Collisions: Predominantly front-to-front and angle impacts, including front-to-rear and sideswipes. Rear-to-rear and same-direction sideswipes are rare or absent given data sampling constraints.
- Topologies: Scenarios span non-intersections, T-intersections, and four-way points, with geospatial accuracy confirmed via map validation.
- Trajectories: Both lawful and erratic pre-crash behaviors are represented, enabling AV assessment under varied risk conditions.
The full dataset, including original reports, simulation files, map representations, and vehicle waypoints, is openly available for research use.
Implications and Future Research
The TRACE pipeline and benchmark directly address the reproducibility, coverage, and realism deficits of previous test generation methodologies in AV research. Through LLM-augmented inference and strict topology-preserving map synthesis, TRACE bridges the gap between raw accident reporting and executable, high-fidelity simulation. This supports nuanced AV evaluation across planning, perception, and control stacks.
The open-source release positions the benchmark for future extension, including:
- Support for complex vertical road topology (tunnels, overpasses)
- Expansion to multi-participant and vulnerable road user scenarios (pedestrians, cyclists)
- Injection of additional report modalities (sensor, weather, contextual video)
- Integration with generative adversarial scenario augmentation to further increase diversity and unpredictability
Methodologically, the use of LLMs for semantic inference from semi-structured reports is noteworthy: future advancements in LLM reasoning and fine-tuning may further enhance the precision of crash reconstructions and automate broader classes of edge-case scenario synthesis.
Conclusion
TRACE represents a significant step toward the automated, reproducible reconstruction of real-world accidents for AV evaluation, achieving strong scenario fidelity through topology-preserving map synthesis and LLM-driven state estimation. The resulting benchmark enables a more rigorous exploration of AV system reliability under true-to-life rare event distributions and geospatial complexity, providing the foundation for robust comparative evaluation and accelerated safety verification research.