Multi-Intersection Traffic Management

Updated 21 November 2025

Multi-intersection traffic management is a coordinated control framework that synchronizes traffic signals and vehicle flows across urban networks to optimize delay, throughput, and safety.
It leverages distributed optimization, multi-agent reinforcement learning, and mathematical models to manage link coupling, queue spillback, and real-time interactions effectively.
Practical implementations using microsimulation and field trials demonstrate significant improvements in travel times, fuel economy, and overall network performance.

Multi-intersection traffic management encompasses algorithmic frameworks, distributed systems, and optimization techniques for the coordinated control of traffic signals and vehicle flows across urban road networks containing multiple, interacting intersections. Unlike single-intersection approaches, multi-intersection management must explicitly account for link coupling, queue spillback, network topology, and real-time interactions among intersection controllers to optimize delay, throughput, safety, and often fairness or fuel economy at network scale. Research in this field spans methods based on distributed mathematical optimization, multi-agent systems, stochastic hybrid models, market-inspired mechanisms, and deep reinforcement learning, often validated in microsimulation testbeds or on field-trial data.

1. Problem Formulation and Modeling Paradigms

The canonical problem is to coordinate control decisions (e.g., signal phase schedules, access times, or vehicle trajectories) across a traffic network $G=(V,E)$ , with $V$ the set of intersections and $E$ the set of directed road segments, to minimize a performance cost (delay, stops, fuel, etc.), subject to system dynamics and safety constraints. Modeling paradigms include:

Hybrid/Discrete Event Systems: Each intersection and its incoming/outgoing roads are modeled as hybrid systems with state variables for queue lengths, signal timing 'clocks,' and possibly in-transit vehicles. Event-driven updates capture light-switching, queue-empty, threshold crossings, and burst-flow arrivals (Geng et al., 2012, Chen et al., 2017, Chen et al., 26 Apr 2024).
Distributed Constraint Optimization (DCOP): Decision variables encapsulate per-car slot assignments at each encountered intersection, constrained by collision avoidance, capacity, and time continuity. Agents may be intersections, vehicles, or both (Iwase et al., 2022).
Reinforcement Learning (Single/Multi-Agent): The control task is represented as an MDP or multi-agent MDP: states encode spatial traffic status (e.g., per-lane queues), actions specify joint phase switch commands or green durations, and rewards reflect network efficiency. Coordination may be explicit (parameter sharing, global reward) or implicit (graph neural encoders for spatial coupling) (Wang et al., 2021, Huo et al., 2019, Tewari et al., 2019).
Reservation and Market Mechanisms: Agents (drivers) request space–time trajectories across multiple intersections; intersections allocate these via combinatorial auctions or dynamic pricing, subject to capacity and safety constraints (Vasirani et al., 2014).

Delayed vehicle transfer between intersections is a critical modeling component, requiring states (e.g., $X_{k \to i}(t)$ or traffic matrices) to capture vehicles in transit on links, with arrival times at downstream queues computed via deterministic or stochastic delay functions (Chen et al., 2017, Chen et al., 26 Apr 2024).

2. Distributed Optimization Methods

A central theme is scalable, distributed algorithms, maintaining tractability as network size grows. Key designs include:

Distributed MILP (Mixed Integer Linear Programming): Fayazi & Vahidi employ per-intersection MILPs to assign discrete access times to approaching autonomous vehicles, with local binary variables resolving conflicts. Cross-intersection coordination is achieved by iteratively exchanging vehicle access times on shared boundary roads, yielding decentralized optimality up to local coupling (Ashtiani et al., 2020).
Distributed Model Predictive Control (MPC): Each intersection solves a local MPC using local queue estimates and green-time variables, with adjacent intersections exchanging predicted states and green allocations. ADMM is used to decompose variables and enforce per-cycle cycle-length constraints, permitting fast parallel solution (Ru et al., 21 Jan 2024). Centralized MPC provides optimality at small scales but lacks real-time feasibility for networks with many intersections (Hosseinzadeh et al., 2022).
Polynomial-Time Decentralized Schedulers: The BRUDR algorithm for DCOPs enables each car to update its intersection schedule in best-response order, with a ‘downstream reset’ mechanism to prevent negative cross-intersection externalities. This achieves Nash equilibria with message and computational complexity scaling with the total travel routes (Iwase et al., 2022).
Market-Based Decentralized Control: Intersection managers run combinatorial auctions for crossing reservations and simultaneously adjust dynamic prices on incoming links according to real-time demand. Driver agents respond by re-routing to minimize personalized utility (joint travel time and cost), yielding emergent network-level balance (Vasirani et al., 2014).

The integrity of distributed coordination relies on local communication protocols (neighbor-to-neighbor or via aggregators), local problem sizes for tractable solution, and robust constraint enforcement (e.g., collision, capacity, feasible delays).

3. Data-Driven and Learning-Based Control

Machine learning, especially deep RL, underpins several scalable multi-intersection controllers. The state-of-the-art architectures include:

Edge-Weighted Graph Convolutional RL: State tensors encode per-lane waiting counts, waiting times, and current phases. An edge-weighted GCN encoder propagates spatial dependency between intersections (via lane-length–weighted adjacency). The decoder outputs per-intersection phase probabilities; simultaneous phase selection for all nodes is achieved using a unified structure, sidestepping the exponential scaling of the naive joint action space (Wang et al., 2021).
Multi-Agent RL (A3C, MARL): Each intersection agent observes its own and its immediate neighbors' states; policy/value networks process local or joint observations; rewards blend local metrics and global network congestion terms to induce cooperative behavior. Architectures share network weights asynchronously (A3C) or train independent actor-critics (selfish MARL), with explicit global reward blending found to reduce total system delay beyond strictly independent RL (Tewari et al., 2019).
Tensor-Based End-to-End RL with Imitation Learning: The system encodes the full multi-intersection state as an $I \times J \times K$ tensor over all intersections, with a parallel Boolean vector for cyclic phase switching. Pre-training via imitation learning from rule-based heuristics accelerates convergence and avoids the pitfalls of purely reinforcement-based cold starts, while PPO stabilizes policy updates in the high-dimensional space (Huo et al., 2019).

Hierarchical and hybrid architectures (end–edge–cloud) are designed for unsignalized C-ITS settings: edge agents manage local RL traffic controls, cloud agents aggregate cross-intersection summaries to optimize network-level criteria (velocity, density, safety), and final vehicle actions are determined by hierarchical action fusion (Jiang et al., 2020).

4. Real-Time Adaptivity and Model Components

Adaptivity to changing conditions is achieved via gradient-based online optimization or recursive state estimation:

Infinitesimal Perturbation Analysis (IPA): Continuous adjustment of TRL parameters (e.g., green/red cycle lengths) uses IPA to compute unbiased gradient estimates from event time and queue measurements. Extensions explicitly incorporate link delays and buffer blocking, yielding improved cost gradients and faster response to congestion transients (Geng et al., 2012, Chen et al., 2017, Chen et al., 26 Apr 2024).
Model Predictive Schemes: Distributed MPC formulations maintain rolling-horizon optimization of local green time and phase selection, adjusting plans every cycle based on measured or predicted queues. ADMM ensures fast convergence for cycle-length and density balancing (Ru et al., 21 Jan 2024).
Learning-Based Adaptation: RL policies retrain continually on new traffic observations (including traffic surges or incident scenarios), supported by experience replay and prioritized sample selection (Wang et al., 2021). RL-based controllers exhibit real-time adaptation to exogenous inflow fluctuations in simulation studies (Tewari et al., 2019).

Recent work has explored the use of LLMs (e.g., GPT-4o) as inference modules for real-time conflict detection and resolution at intersections, integrating live state feeds and outputting actionable suggestions for multi-agent settings with up to three intersections, maintaining sub-200 ms end-to-end latency (Masri et al., 1 Aug 2024).

5. Empirical Performance, Scalability, and Trade-offs

Large-scale microscopic simulation and field-trial studies provide quantitative support for algorithm efficacy:

Study / Method	Network Size	Delay Reduction / Efficiency	Notable Metrics
MILP + distributed coord	3×3 grid	Avg. travel time 1′54″ vs. 2′40″	Stops: 2140 vs. 6886; MPG: +17%
DCOP-BRUDR	24 intersections	Avg. delay –41% vs. FCFS baseline	Scales linearly in # of vehicles
GCN-based RL (EGU-RL)	up to 22	Accum. wait time ↓80% vs. fixed-time	Inference <0.5 ms, param. const.
Two-lane MPC–ADMM	6 (Dalian)	Delay: 105 s vs 229 s (fixed time)	Cycle compute ~1.3 s (real-time)
MPC emergency control	4	Emergency delay ↓50% (centralized)	Decentralized ~1000× faster
Tensor RL + PPO/IL	9 (grid)	QL/AWT/AFC: all superior to DQN, DDPG	Fast convergence, scalable input

Trade-offs and limitations include:

Small additional travel-time reduction from coordinated over uncoordinated MILP, but significantly smoother trajectories (higher fuel economy) (Ashtiani et al., 2020).
MPC optimality degrades little with decentralization but enables real-time performance at city scale (Hosseinzadeh et al., 2022, Ru et al., 21 Jan 2024).
GCN-unified-decoder models are parameter-efficient and latency-invariant with intersection count (Wang et al., 2021).
Some RL and auction-based schemes lack formal safety or fairness guarantees and may require further regulatory adaptation for deployment.

6. Extensions, Open Challenges, and Future Directions

Open research directions identified across cited works include:

Mixed Traffic/Modal Flows: Incorporate pedestrian phases, bicycle, bus priority, and mixed human–autonomous fleets, requiring richer models of conflict and service.
Routing Control Coupling: Joint signal control and route guidance (market-powered assignment, DCOP with flexible paths) to maximize network-wide efficiency (Vasirani et al., 2014, Iwase et al., 2022).
Dynamic, Data-Driven Models: Online calibration of transit delays and blocking parameters; adaptation to traffic incidents, communication delay/loss, and nonstationary demand (Chen et al., 2017, Chen et al., 26 Apr 2024).
Privacy/Security and LLM Integration: Secure V2I/V2V messaging; scalable multi-agent LLM architectures to extend real-time reasoning to larger grids while maintaining sub-200 ms latency (Masri et al., 1 Aug 2024).
Multi-objective Control: Simultaneous minimization of delay, emissions, energy, and equity/fairness metrics; integration of these aspects into MILP/DCOP/MPC or RL-based frameworks (Wang et al., 2021, Iwase et al., 2022).

Recent frameworks demonstrate that scalable and even decentralized multi-intersection management is achievable with distributed optimization, event-driven adaptation, or modern deep architectures. The integration of real-world data streams, safety guarantees, and equity objectives remains a significant frontier for advancing both fundamental algorithms and practical deployments in urban traffic networks.