CommonRoad Benchmark Suite

Updated 29 December 2025

CommonRoad Benchmark Suite is a comprehensive, open-source platform that standardizes and compares autonomous vehicle motion planning algorithms in realistic traffic scenarios.
It features over 500 scenarios spanning highway and urban settings with both non-interactive and interactive agent behaviors for testing safety, efficiency, and compliance.
The suite employs rigorous evaluation metrics and advanced simulation tools to ensure reproducible and fair benchmarking for research and industry advancements.

The CommonRoad Motion Planning Competition is an annual benchmarking event designed to rigorously compare motion-planning algorithms for autonomous vehicles in realistic, interactive traffic scenarios. The competition leverages the CommonRoad benchmark suite, which offers standardized, extensible scenarios depicting heterogeneous traffic in highway and urban environments. The framework evaluates planners according to efficiency, safety, comfort, and traffic rule compliance, providing a reproducible, open-source environment for academic and industrial research (Kochdumper et al., 10 Nov 2024, Huang et al., 22 Dec 2025).

1. Benchmark Framework and Scenario Design

The CommonRoad benchmark suite is an open-source platform built on real road networks (using lanelets as the geometric abstraction) and integrates realistic agent behavior, either by replaying recorded data or by simulating dynamic interactions through SUMO. Scenarios include multi-lane highway maneuvers, urban intersections with signalized and unsignalized crossings, and environments with diverse traffic participants such as passenger cars, buses, trucks, bicycles, and, in some maps, virtual pedestrians. The 2023 and 2024 editions utilized over 500 scenarios, partitioned evenly between highway and urban environments.

Scenario types are divided into non-interactive (where all agents follow provided reference trajectories) and interactive (where agents dynamically react to the ego vehicle). Interactive agent reactions are carried out in SUMO, ensuring authentic interactivity and adversarial potential within the closed-loop evaluation (Kochdumper et al., 10 Nov 2024, Huang et al., 22 Dec 2025).

2. Competition Structure and Reproducibility

Each competition proceeds in two phases:

Phase I ("Didactic"): Open to all competitors to familiarize them with the CommonRoad infrastructure using public scenarios. Results are not counted in the final ranking.
Phase II ("Evaluation"): Utilizes a closed set of previously unseen, interactive, non-public scenarios. Teams submit their Dockerized motion planners for execution under identical hardware constraints (maximum two CPU cores, six-hour wall-clock limit) on CommonRoad servers. This procedure ensures bit-level reproducibility and equitable computational conditions(Huang et al., 22 Dec 2025).

Provided toolboxes include a drivability checker (for collision and kinematic constraint validation), a route planner (for Frenét-frame trajectory generation), the CRIME toolbox for criticality measurement, and signal temporal logic (STL) monitors for formal rule compliance. All scenarios adhere to the open CommonRoad format, and the choice of Dockerization facilitates identical results across all evaluation runs.

3. Evaluation Metrics and Cost Function

Planners are first screened for feasibility based on three criteria:

Collision-Free: No overlaps between vehicle and obstacle occupancy sets throughout the trajectory.
Kinematic Feasibility: Adherence to a nonlinear kinematic single-track model (typically parameterized from production vehicles such as the Ford Escort (Huang et al., 22 Dec 2025)).
Road Compliance: Trajectories remain fully within drivable lanelets; encroachment on sidewalks or bike-only lanes immediately triggers rejection.

Once feasibility is satisfied, a multicriterion cost function is used to score each trajectory. The canonical form is:

$J_{\text{ego}} = w_1 J^{\text{lon}_J} + w_2 J_{\text{SR}} + w_3 J_D + w_4 J_{\text{LC}}$

with $w = [0.01, 22, 8, 5]$ (2024 edition). The terms are defined as:

Longitudinal Jerk Penalty $J^{\text{lon}_J} = \int_{t_0}^{t_f} (\dddot s(t))^2 dt$
Steering-Rate Penalty $J_{\text{SR}} = \int_{t_0}^{t_f} v_\delta(t)^2 dt$
Proximity to Obstacles $J_D = \int_{t_0}^{t_f} \max_i \exp(-w_{\text{dist}} d_i(t)) dt$ , with $w_{\text{dist}}=0.2$ and $d_i$ being the distance to obstacle $i$
Lane-Centering Offset $J_{\text{LC}} = \int_{t_0}^{t_f} d(t)^2 dt$ , with $d(t)$ being the lateral distance to lane center

Additional evaluation dimensions include the number of benchmarks solved (“coverage”), average solution time per scenario, number of safety incidents (including time-to-collision distributions), comfort in terms of peak jerk and steering rates, and binary rule compliance. Robustness with respect to traffic rules (e.g., safe following distance and compliance with traffic flow) is formally monitored via STL and can trigger additional penalties or branch disqualification upon violation (Kochdumper et al., 10 Nov 2024, Huang et al., 22 Dec 2025).

4. Representative Algorithmic Approaches

The leading submissions in both 2023 and 2024 exemplify the primary paradigms in state-of-the-art autonomous driving motion planning.

Stony Brook University (SBU-2023)

Architecture: Reachability-based high-level corridor selection using double-integrator dynamics followed by nonlinear OCP refinement in the selected corridor.
Safety: Reachable-set pruning establishes safety envelopes adhering to traffic rules and physical constraints (e.g., friction circle).
Optimization: The OCP minimizes deviation from a reference trajectory and penalizes control effort, solved in Python/CasADi with $\Delta t = 0.1$ s and a 4 s planning horizon.
Trade-offs: Emphasizes lower jerk and fewer close-calls at the expense of slightly longer path times.

Technical University of Munich (TUM-2024)

Architecture: Sampling-based planner in the Frenét frame that generates a lattice of candidate quintic polynomial trajectories and transforms them to Cartesian space for validation.
Interaction Handling: Integrates level- $k$ game theory for intersection negotiation, with 14 maneuver options and risk-sensitive cost evaluation.
Scalability: Up to 90,000 candidate trajectories checked per cycle; multi-core C++ implementation achieves rapid solution times (e.g., median 717 ms per replan in C++, over 10,000 ms in unoptimized Python).
Trade-offs: Achieves higher coverage and more “top-1” costs, with slightly increased risk of near-collision events due to more aggressive planning.

Submission	Solved Scenarios (2023)	Mean Comfort Cost (jerk+steering)	Safety Incidents (per 100)	Coverage (2024)	Mean TR1 (Aachen)	Mean TR1 (Cologne)	Mean TR1 (Dresden)
SBU-2023	116	0.28	2.1	68 (Aachen)	4.622	3.972	4.061
TUM-2024	118	0.32	3.0	82 (Aachen)	7.629	5.966	7.059

On benchmarks solvable by both planners, the SBU-2023 entry achieved ≈40% lower average cost and lower worst-case values, while the TUM-2024 planner solved ≈31% more unique scenarios overall (Kochdumper et al., 10 Nov 2024, Huang et al., 22 Dec 2025).

5. Key Findings and Paradigm Trade-offs

A central conclusion from comparative analysis is that disparate motion planning paradigms—reachability+OCP (SBU-2023) vs. high-volume sampling (TUM-2024)—can attain near-identical overall performance when evaluated on a comprehensive, interactive benchmark suite. Sampling-based planners exhibit clear advantages in dense, highly interactive scenes due to their rapid exploration of trajectory homotopies, while optimization-based approaches provide stronger smoothness and formal safety guarantees.

However, all approaches evidenced limitations in complex, narrow urban domains, particularly when confronted with unpredictable two-wheeler and pedestrian behavior, highlighting a gap in anticipation and occlusion modeling. Real-time performance was also critical; planners exceeding 1 s per decision cycle were heavily penalized in solved-count due to strict wall-clock constraints (Kochdumper et al., 10 Nov 2024, Huang et al., 22 Dec 2025).

6. Methodological Innovations and Continuous-Time Safety

Recent advances in motion planning highlight the utility of spatio-temporal optimization frameworks. Notably, the use of trapezoidal prism-shaped corridors in conjunction with Bézier-curve parameterizations has been shown to strictly enlarge the feasible solution space while guaranteeing continuous-time collision avoidance. This approach leverages the convex hull property of Bézier curves and enforces control-point constraints via the Bernstein-to-monomial transition matrix. Empirical results in CommonRoad benchmarks have demonstrated higher success rates, reduced conservatism, smoother trajectories (lower peak accelerations), and real-time replanning capabilities (e.g., $\sim$ 12 ms QP solve times) compared to traditional cuboidal corridor methods (Deolasee et al., 2022).

A plausible implication is that such spatio-temporal formulations—especially those able to efficiently exploit non-axis-aligned, time-varying safe sets—are well-positioned for competitive scenarios with tight timing and kinematic constraints.

7. Open Challenges and Future Directions

Key recommendations for evolving both the CommonRoad benchmarks and the underlying algorithmic toolbox include:

Augmenting the scenario suite with additional non-motorized agents (e.g., e-scooters, richer pedestrian models) and adversarial interactions (map errors, dynamic blockages).
Expanding the range of formally monitored traffic rules within the evaluation.
Developing hybrid planning architectures that integrate formal optimization-based safety checks with sampling-based or learning-enhanced modules for scalability and adaptability.
Accelerating nonlinear OCP solution via warm starts, meta-optimization, or GPU-based parallelization.
Embedding online adaptation of trajectory cost weights using reinforcement learning when faced with recurring pattern-rich environments.
Integrating richer prediction and long-term scene understanding, while maintaining formal safety guarantees.

The 2023–2024 competitions established reproducible, open benchmarking as a de facto standard in the community, highlighted critical trade-offs between scalability and optimality, and provided a foundation for systematic improvement of autonomous vehicle motion planning algorithms (Kochdumper et al., 10 Nov 2024, Huang et al., 22 Dec 2025).