Courteous Virtual Traffic Signal Control

Updated 25 November 2025

Courteous Virtual Traffic Signal Control is an intersection management method that quantifies inter-vehicle delays using a formal courtesy metric for both efficiency and equity.
It employs QUBO-based quantum annealing and deep reinforcement learning to dynamically schedule intersection phases and reduce travel times.
Real-time integration of vehicle data with advanced control algorithms demonstrates statistically significant performance improvements over classical systems.

Courteous Virtual Traffic Signal Control (CVTSC) is a class of infrastructure-light intersection management methods that extend Virtual Traffic Light (VTL) concepts by explicitly quantifying and reducing the delays different groups of connected and automated vehicles (CVs, CAVs) impose on each other when granted or denied right-of-way. Unlike conventional adaptive signals, CVTSC embeds a formal "courtesy" metric into its optimization or control logic, thus directly targeting both efficiency (minimized delay, increased throughput) and equity (fair distribution of wait times). Implementations span combinatorial optimization (notably via quadratic unconstrained binary optimization on quantum annealers) and deep reinforcement learning addressing both signalized and unsignalized intersection contexts (Enan et al., 22 Dec 2024, Yan et al., 2021).

1. Foundations and Formalization

CVTSC extends the VTL paradigm by formalizing the "courtesy cost" of intersection slotting decisions. In the canonical quantum-annealing-based variant, for an $n$ -phase intersection (NEMA template), each phase $i$ with $m_i$ queued CVs reports individual estimated times of arrival $\mathrm{ETA}_{i,v}$ . Binary decision variables $x_{i,k} \in \{0,1\}$ encode whether phase $i$ occupies position $k$ in a signal cycle. The key construct is the stopped-delay courtesy cost, quantifying inter-phase delays:

$\Delta_{i\to j}^{(v)} = \max\left(0,\, T_i^\mathrm{last} - \mathrm{ETA}_{j,v} + Y + R\right)$

with $T_i^\mathrm{last} = \max_{v \in i}\ \mathrm{ETA}_{i,v}$ . The total system courtesy cost over a full $\mathcal{O}(n!)$ permutation cycle is

$C_\mathrm{courtesy} = \sum_{k=1}^{n-1}\ \sum_{i=1}^n\,\sum_{j=1}^n \Delta_{i\to j}\, x_{i,k}\, x_{j,k+1}$

where $\Delta_{i\to j} = \sum_{v=1}^{m_j}\Delta_{i\to j}^{(v)}$ . This explicit quantification distinguishes CVTSC from traditional VTL and other adaptive signal approaches, providing a direct target for optimization (Enan et al., 22 Dec 2024).

2. Optimization and Control Architectures

2.1 QUBO and Quantum Annealing Approaches

The CVTSC problem is cast as a QUBO, encoding both the courtesy cost and hard assignment constraints (permutation requirement) as a single quadratic form:

$H(x) \,=\, H_\mathrm{courtesy}(x) + \lambda\, H_\mathrm{constraints}(x)$

with

$H_\mathrm{constraints}(x) = \sum_{i=1}^{n}\left(\sum_{k=1}^{n}x_{i,k} - 1\right)^2 + \sum_{k=1}^{n}\left(\sum_{i=1}^{n}x_{i,k} - 1\right)^2$

where $\lambda \gg \max_{i,j}\Delta_{i\to j}$ . This unconstrained binary quadratic program is amenable to solution via quantum annealing (e.g., on a D-Wave Pegasus topology), using minor-embedding to chain logical variables to physical qubits. Typical annealing parameters include $20\,\mu$ s anneal times with $\geq 1{,}000$ reads and post-processing specified as "optimization" (Enan et al., 22 Dec 2024).

2.2 Deep Reinforcement Learning Architectures

For unsignalized and mixed traffic intersections, CVTSC can be realized via centralized deep reinforcement learning. The intersection manager is formulated as an MDP $(\mathcal S,\mathcal A,P, r, \gamma)$ with:

State $\mathcal S$ : Fixed-length vectorized encodings of spatial and temporal features (e.g., $[x, v, \Delta t, \text{type}, \text{routeID}]$ ) for all vehicles within $150$ m of the intersection.
Action $\mathcal A$ : Virtual "yield" or "stop" commands mapped to subsets of CAV routes, implementing virtual reds (CAVs commanded to stop) and default priorities.
Reward $r_t$ : Joint function of throughput and travel times, with equity factors to linearly favor longer-waiting vehicles, thus encoding "courtesy" at the level of reward shaping.

The RL agent is trained via Proximal Policy Optimization (PPO), with separate policy and value networks (input dim $343$, layers $[2048,1024]$ , Adam optimizer, $\gamma=0.98$ , $\epsilon=0.001$ for clipping) (Yan et al., 2021).

3. Real-Time Scheduling and System Integration

Both QUBO and RL-based CVTSC require tight real-time integration between control logic and vehicle state acquisition. For the quantum approach:

Every control cycle ( $\Delta T$ ), vehicle BSMs are collected, ETAs computed, and the courtesy cost matrix $\Delta_{i\to j}$ updated.
The QUBO is constructed and solved via the D-Wave cloud API.
The binary solution $x^*$ is parsed to produce a phase ordering; the first phase is selected as current green, and SPaT messages are dispatched accordingly.
After phase clearance (all CVs of the phase having speed $>\epsilon$ ), yellow and red phases are broadcast before cycling.

The RL-based system operates with a 1 s action interval, generating virtual phase orders and issuing immediate stop/release commands to CAVs via the control interface, leaving human-driven vehicles to follow static physical priorities (Enan et al., 22 Dec 2024, Yan et al., 2021).

4. Performance Evaluation

Quantitative results underpinning CVTSC efficacy are provided for both the quantum‐optimization (Enan et al., 22 Dec 2024) and RL (Yan et al., 2021) formulations.

Capacity (% of Nominal)	Method	Avg. Delay (s)	Avg. Travel Time (s)	% Δ Delay vs Best Classical
35	Quantum Annealing	41.2	105.3	–50%
35	Adam Optimizer	82.4	142.1	–
105	Quantum Annealing	60.8	125.1	–42%
105	Adam Optimizer	105.2	171.9	–

Across all regimes, quantum annealing achieves $45$– $55\%$ lower stopped delays and $25$– $35\%$ lower travel times relative to the best classical optimizer. Two-sample one-tailed t-tests with unequal variances always yield $p < 0.05$ , indicating statistical significance (Enan et al., 22 Dec 2024).

For RL-based CVTSC under mixed traffic:

CAV Penetration (%)	Mean Travel Time (s)	Throughput (%)
10	471.0	74.6
50	267.0	85.8
90	154.5	92.2

Compared to baseline static (RS) and adaptive (TL) controllers, CVTSC achieves median travel-time reductions of $35$– $60\%$ and throughput increases of $15$– $30\%$ . Gains are realized even at low CAV penetration, with further improvements as CAV share increases (Yan et al., 2021).

5. Scalability, Limitations, and Future Directions

Current QUBO-based CVTSC is implemented for 100% CV penetration and assumes no pedestrian phases, limiting direct real-world deployment scope. RL-based implementations support mixed traffic and are backward-compatible, as only CAVs obey virtual signals while human-driven vehicles (HVs) default to statutory priorities. End-to-end latency on cloud-based quantum hardware (~3 s) exceeds the 1 s target, but hardware evolution (local quantum accelerators, denser qubit connectivity) is expected to close this gap (Enan et al., 22 Dec 2024).

Potential extensions include:

Multi-intersection coordination via distributed CVTSC agents and state sharing.
Robust control under V2X unreliability (packet loss, dropped commands, noncompliant vehicles).
Hybrid scenario support (mixed CAV/HV, pedestrian flows).
Additional constraints (queue length limits, emergency vehicle priority).

6. Significance and Impact

CVTSC operationalizes the notion of courteous intersection management among automated vehicles, unifying efficiency and equity objectives within a single optimization or learning framework. By making the courtesy metric $\Delta_{i\to j}$ a first-class control target and deploying advanced solution techniques (quantum annealing, RL), CVTSC achieves demonstrable reductions in both aggregate and tail travel times and improves throughput. These improvements are especially pronounced under high-traffic regimes and with longer upstream sensing/approach zones, where vehicle sequencing decisions are most impactful (Enan et al., 22 Dec 2024, Yan et al., 2021).

The methodology underpins a paradigm shift in intersection control: from static or centrally adaptive signals toward decentralized, context-aware, and explicitly fair virtual signaling systems that scale with CAV adoption and evolving quantum and AI hardware capabilities.