QarSUMO: Congestion-Optimized Traffic Simulator

Updated 1 July 2026

QarSUMO is a parallel, congestion-optimized extension of SUMO that integrates meta-parallelization and virtual grouping to simulate congested urban traffic efficiently.
It partitions large traffic networks using METIS and MPI processes, achieving balanced load distribution and significant scalability improvements.
The congestion-aware virtual grouping reduces redundant vehicle updates in slow-moving zones, delivering up to 23× speedup while maintaining simulation accuracy.

QarSUMO is a parallel, congestion-optimized extension of the open-source SUMO traffic simulator, designed to overcome computational inefficiencies in large-scale, congested urban traffic simulation. By introducing a meta-parallelization layer and domain-specific optimizations targeting slow-moving vehicular flows, QarSUMO enables efficient, scalable microscopic traffic simulation suitable for reinforcement learning-based urban control studies and large-scale transportation analysis. It maintains compatibility with the SUMO ecosystem and is designed for integration with high-performance computing environments, including support for both multi-core and future multi-node architectures (Chen et al., 2020).

1. Motivation and Limitations of Conventional SUMO

The original SUMO implements microscopic, time-stepped simulation updating every vehicle’s trajectory and decision variables (e.g., car-following, lane-changing) at every simulation step (default Δt = 0.5 s). In congested scenarios—characterized by high vehicle densities and predominantly low speeds—this leads to a disproportionate growth in cumulative vehicle-steps and redundant computations because updates are performed even when vehicles barely move. This bottleneck becomes acute for reinforcement learning (RL) tasks where policy optimization requires thousands of simulation episodes, severely restricting the feasibility of large-network RL studies (Chen et al., 2020).

2. System Architecture and Parallelization Strategy

QarSUMO wraps multiple SUMO instances, each tasked to independently simulate a partition of the whole road network. Partitioning is conducted using METIS for balanced vertex assignment and edge-cut minimization. Each partition is managed by a separate MPI process, enabling parallel execution across multi-core systems and, in future, clusters. Communication and synchronization between partitions focus on “border edges,” which handle vehicles and traffic states at subnetwork boundaries. Each process exchanges border state information via MPI_Alltoall, maintaining cross-partition consistency by updating local “shadow” vehicles and junctions corresponding to foreign partition boundaries.

The architecture overview is as follows:

Component	Description	Technology
Partitioning	METIS-based static division, traffic-aware weighting	METIS
Parallel runtime	One SUMO instance per partition, managed by MPI process	MPI, Libsumo
Synchronization	Border vehicle state exchange at each timestep; local update of shadows	MPI_Alltoall, Libsumo

This design ensures QarSUMO retains compatibility with SUMO’s APIs (TraCI/Libsumo) and permits adoption of future SUMO feature releases with minimal interruption (Chen et al., 2020).

3. Congestion-Aware Virtual Grouping Optimization

To address inefficiency in highly-congested segments, QarSUMO introduces a “virtual grouping” mechanism that exploits uniformity among slow-moving vehicle groups. A lane is divided into an “exit zone” and multiple upstream zones. If the mean vehicle speed in a zone falls below a strict threshold (ε = 0 in reported experiments, i.e., vehicles are stationary), all vehicles in that zone are grouped virtually. Only the leading vehicle is simulated with the full car-following and lane-changing model; followers are propagated by copying the leader’s movement state, skipping redundant calculations. The grouping is re-evaluated every time-step and disbanded if the leading vehicle enters the exit zone or the zone’s average speed rises.

This approach significantly reduces computational complexity during congestion, as the per-timestep cost per lane falls from O(N) to O(1) in grouped zones, where N is the number of vehicles in the group.

4. Partitioning and Load Balancing Methods

Traffic-aware partitioning is critical for scalability and load balance. Each network junction is weighted by:

$w_v' = \sum_{e\in\mathrm{inc}(v)} (C_e \cdot L_e); \quad w_v = \mu + w_v'$

where $C_e$ is the expected vehicle count and $L_e$ the edge length. METIS is then tasked to allocate junctions $v$ across $N$ partitions such that load is balanced and border-edge count is minimized. Traffic-skew-aware partitioning (versus purely vertex-count-based) reduces per-partition load imbalances by up to 40%, a critical factor in real-world, nonuniform road networks (Chen et al., 2020).

5. Parallel Synchronization Protocol

Each simulation step in QarSUMO follows this protocol:

Advance the local SUMO instance by one step.
Extract state of vehicles adjacent to partition borders.
Exchange border states among all partitions through MPI_Alltoall.
Update the state of "shadow" vehicles and junctions to reflect remote partition changes.

This ensures simulation coherence, with complexity per time-step of $O(|V_p|)$ local work and $O(B_p)$ communication, where $B_p$ is the number of border vehicles per partition.

6. Performance Evaluation and Experimental Results

QarSUMO’s experimental validation spans real (Doha Corniche, TAPASCologne) and synthetic (urban grid) networks, benchmarking execution time, scalability, communication overhead, and simulation fidelity. Key results:

On a 150×10 urban grid (synthetic), up to 23× speedup with 32 partitions was observed.
Large real networks (Cologne): ≈14.6× at 32 partitions.
Smaller, irregular networks (Corniche): ≈5.7× at 32 partitions; parallelization benefit plateaus beyond 8 partitions due to network topology.
The congestion-awareness module alone yields ≈2× speedup under high traffic while maintaining trip-time errors <6% (grid) and <3% (Corniche).
Combining meta-parallelism and grouping: a 1-hour RL data generation episode reduced from ~60 hours to ~5.7 hours for 500 simulated hours on grid networks.
Total communication cost remains moderate (0.7–6.9% of wall-clock per step), with message volumes scaling sensibly with active vehicle count.

Trip-time and route-length CDFs show negligible deviation from the sequential SUMO (<2–8%), preserving modeling fidelity (Chen et al., 2020).

7. Compatibility, Extensibility, and Future Directions

QarSUMO’s architecture uses only SUMO’s published interfaces and standard data structures, avoiding core modifications except for the optional congestion module. This choice facilitates straightforward integration of SUMO updates, internal multi-threading, and other toolchain improvements. The MPI layer is architected for future cross-node (cluster) execution, and plans include support for dynamic re-partitioning for real-time load balance and RL-specific simulator enhancements.

Further research directions include:

Adaptive, finer-grained congestion detection (dynamic thresholds, variable zone counts).
Multi-level parallelism combining core SUMO threading and QarSUMO’s inter-partitioning.
Evaluation of RL training quality and convergence impacts under accelerated simulation.

References

QarSUMO: A Parallel, Congestion-optimized Traffic Simulator (Chen et al., 2020)

Markdown Report Issue Upgrade to Chat

References (1)

QarSUMO: A Parallel, Congestion-optimized Traffic Simulator (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to QarSUMO.