Adaptive Traffic Signal Control (ATSC)

Updated 21 November 2025

Adaptive Traffic Signal Control (ATSC) is a dynamic method that adjusts signal phases in real time based on live sensor data and predictive optimization.
It employs rule-based strategies, model predictive control, and reinforcement learning to tackle heterogeneous traffic demands and network complexities.
ATSC improves urban mobility by reducing delays, emissions, and conflict rates, with simulation studies showing measurable performance gains.

Adaptive Traffic Signal Control (ATSC) refers to real-time adjustment of traffic signal parameters based on prevailing traffic conditions, aiming to optimize efficiency, throughput, safety, and environmental outcomes in urban road networks. Moving beyond traditional fixed-time or pre-timed control schemes, ATSC algorithms dynamically reallocate green phases and durations using advanced sensing, prediction, and optimization methods. Networked ATSC addresses the complexities posed by heterogeneous topologies, stochastic demand, partial observability, spatial dependencies (e.g., spill-back), and operational constraints. This entry surveys the taxonomy, foundational models, algorithmic frameworks, recent advances, and deployment considerations for ATSC as evidenced in contemporary research.

1. Conceptual Evolution and Taxonomy of ATSC

The progression of ATSC is historically divided into generational and functional families. Early systems (Generations 0–1) relied on static cycle-offset-split (COS) plans or selected plans in response to aggregate sensor measurements. Generation 2 introduced pattern adjustment, with systems like SCOOT and SCATS modifying splits, offsets, and cycle lengths in response to upstream detector data. Generation 3 dispensed with fixed cycles, enabling phase-level decisions on short horizons—either continuously (phase extension) or through predictive rolling optimization (dynamic programming, MPC, RL) (Shams et al., 2023).

ATSC solutions are organized along six dimensions:

Topographic Structure: Local-only intersection control vs. system-level hierarchical coordination.
Time Resolution: Continuous sub-second/second-by-second decision vs. finite-horizon plan optimization.
Mechanism: Rule-based logic, dynamic programming, MILP, model predictive control (MPC), metaheuristics, fuzzy logic, or reinforcement learning.
Objectives: Minimization of delay, queue, stops, or emissions; maximization of throughput or pressure.
Cyclic vs. Acyclic: Operation under fixed/adaptive cycles versus rolling-horizon phase sequencing.
Extensions: Multimodal/pedestrian inclusion, multi-objective trade-offs, and robust or stochastic variations.

This taxonomy enables researchers to situate new methods precisely within the established landscape, clarifying methodological appropriateness for given infrastructure, data, and operational contexts (Shams et al., 2023).

2. Mathematical Modeling and Theoretical Foundations

ATSC systems are customarily formulated as Markov Decision Processes (MDPs) or their variants (MOMDPs, POMDPs) (Wang et al., 2022). For a signalized network:

State $S$ : Encodes intersection-level measurements (queue lengths, current phase, moving/stopped vehicles) and optionally neighbor/region-level features, possibly factored into observable and latent components.
Action $A$ : Represents green phase selection, split/duration assignment, or hybrid (discrete-continuous) commands.
Reward $R(s,a)$ : Commonly negative total or maximum queue, negative waiting time, or composite measures for safety/emissions (e.g., $R_{\text{total}} = w_{\text{eff}}R_{\text{eff}} + w_{\text{safety}}R_{\text{safety}} + w_{\text{emis}}R_{\text{emis}}$ ) (Mirbakhsh et al., 1 Aug 2024).
Transition $P(s'|s,a)$ : Dynamically captures vehicle inflow, outflow, and interactions via simulation or digital twin models.
Objective: Maximize expected discounted or average sum of rewards, $\mathbb{E}_\pi[\sum_{t} \gamma^t r_t]$ .

In multi-agent frameworks, the joint process can be cast as a Markov game. Under no spill-back, the global $Q^*$ -function decomposes additively across agents, enabling decentralized learning; under spill-back, cross-agent dependencies necessitate centralized (or coordinated) value estimation (Zhang et al., 23 Feb 2025).

Challenges such as partial observability are intrinsic, as agents rarely access global traffic state. The ATSC POMDP structure motivates recurrent and attention-based architectures for history aggregation and belief-state approximation (Wang et al., 16 Sep 2024).

3. Algorithmic Approaches

3.1 Rule-Based, Optimization-Based, and RL-based Control

Rule-Based: Marginal delay, phase extension, and max-pressure policies compute control actions by direct calculation over local states (e.g., serve phase $p$ with maximum pressure $P_p = \sum(q^{\rm up} - q^{\rm down})$ ) (Shams et al., 2023, Wang et al., 2022).
Optimization-Based: Approaches such as MPC solve finite-horizon problems subject to constraints (e.g., minimize $\sum_k q_{t+k}^2$ , with cycle, green, and safety constraints), enforcing global or corridor-level objectives (Wang et al., 2022).
Reinforcement Learning: Deep RL parametrizes state-action value (Q) or policy (π) functions with neural networks. Both single-agent (centralized), multi-agent (decentralized), and regional (hierarchical) structures are prevalent.

3.2 Multi-Agent and Hybrid RL Models

Scalability demands distributed MARL, often utilizing advantage actor-critic (A2C), DQN, or PPO variants. Communication, finger-printing of neighbor policies, spatial discounting, or graph-based embeddings address non-stationarity and observability (Chu et al., 2019, Gu et al., 18 Feb 2025, Zhang et al., 14 Mar 2025).

Novel models include hybrid discrete-continuous action formulations (PH-DDPG), where agents output both phase selection and green duration. A Gaussian noise mask in the critic enforces robust, decoupled parameter learning (Wang et al., 18 Mar 2025).

Frameworks for resource-constrained deployment, such as TinyLight, leverage entropy-minimized super-graph sparsification to extract low-FLOP, intersection-tailored policy networks amenable to microcontroller implementation (Xing et al., 2022).

3.3 Regional and Federated Learning

Hierarchical regional partitioning—clustering intersections by traffic dynamics or topology—enables federated reinforcement learning at group or personalized agent levels (FedClusterLight, FedFomoLight). Intra-group FedAvg yields significant gains under network heterogeneity, matching centralized RL performance at lower communication cost (Fu et al., 7 Apr 2025).

Single-agent regional RL models manage all intersections with an adjacency-matrix state and linearly scaled action space (select intersection and adjustment), facilitating deployment using probe vehicle data for queue estimation (Li et al., 1 Nov 2025).

4. Advanced Topics: Constraints, Safety, and Multi-Objective Control

Operational constraints (max green, fairness, skip limits) are addressed by constrained MARL: MAPPO-LCE integrates Lagrangian multipliers with a cost estimator network for stable constraint enforcement across GreenTime, PhaseSkip, and GreenSkip metrics. This approach maintains throughput and efficiency while adapting policies to remain compliant under diverse real-world constraints (Satheesh et al., 30 Mar 2025).

Multi-objective RL frameworks (e.g., D3QN-based ATSC-SED) explicitly weigh safety (e.g., conflict rate based on TTC), efficiency (waiting time), and emissions (CO₂) via composite reward functions, enabling explicit trade-off control and promoting robust performance under high demand (Mirbakhsh et al., 1 Aug 2024).

5. Digital Twin, Data-Driven, and Security-Aware ATSC

Digital twin (DT) architectures virtualize the entire traffic network—vehicles, signals, and states—enabling parallel simulation, demand forecasting, and rolling-horizon deployment of adaptive algorithms. DT1 minimizes per-approach delay using local data; DT2 incorporates upstream intersection delays, improving delay distribution equity but suffering under oversaturation (Dasgupta et al., 2023, Dasgupta et al., 2021).

Security-aware ATSC tackles vulnerabilities introduced by real-time data ingestion. For example, waiting-time–based ATSC is susceptible to "slow-poisoning" attacks through fake vehicles. LSTM-based anomaly detectors using upstream flow data provide robust early detection against such low-rate stealth attacks (Dasgupta et al., 2021).

6. Empirical Performance and Benchmarking

Large-scale simulation studies consistently report substantial reductions in average travel time, waiting time, conflict rates, and emissions over classical baselines. For instance, PH-DDPG realizes up to 7% travel-time and 10% destination-arrival-rate improvements across multiple cities (Wang et al., 18 Mar 2025). BCT-APLight achieves a 9.60% reduction in average queue length and 15.28% reduction in average waiting time versus the best RL baselines on seven real-world datasets (Duan et al., 18 Dec 2024).

Collaborative MARL models (e.g., Unicorn) with universal state-action mapping, intersection-specific latent encoding, and contrastive refinement outperform both homogeneous and heterogeneous network baselines in queue reduction, delay minimization, and throughput maximization across eight public benchmarks (Zhang et al., 14 Mar 2025).

Transformer-based controllers decisively surpass LSTM-based and non-sequential RL models in abating partial observability, producing up to 65.8% reductions in episodic network delay when deployed in corridor environments (Wang et al., 16 Sep 2024).

7. Limitations, Challenges, and Research Directions

ATSC research faces systematic challenges: scalability to very large and heterogeneous networks, simulation-to-reality transfer, non-stationarity due to stochastically varying demand, and the need for unified benchmarking (Wang et al., 2022, Shams et al., 2023). Robust deployment requires accounting for sensor noise, communication latency, actuator faults, and cybersecurity threats.

Hybrid RL-MPC architectures, transfer learning with GNNs, risk-sensitive optimization, and real-time federated/grouped learning offer promising directions. Bridging the gap from simulated to real-world control—while enforcing safety and fairness constraints and integrating multi-modal users—remains an open and active frontier.

References: The above synthesis draws upon empirical and theoretical results in (Shams et al., 2023, Wang et al., 2022, Wang et al., 18 Mar 2025, Mirbakhsh et al., 1 Aug 2024, Xing et al., 2022, Li et al., 1 Nov 2025, Chu et al., 2019, Fu et al., 7 Apr 2025, Zhang et al., 14 Mar 2025, Zhang et al., 23 Feb 2025, Satheesh et al., 30 Mar 2025, Duan et al., 18 Dec 2024, Dasgupta et al., 2021, Dasgupta et al., 2021, Dasgupta et al., 2023), and (Wang et al., 16 Sep 2024), among others.