CoLight: Multi-Agent Traffic Control

Updated 23 November 2025

CoLight is a multi-domain framework that applies coordinated reinforcement learning and graph attention networks to urban traffic control, multi-view imaging, and quantum optics.
In traffic signal control, it uses decentralized multi-agent learning with adaptive neighbor communication to reduce travel time by 7–12% over baselines.
Enhanced variants with Advanced Traffic State (ATS) and inequity aversion (IACoLight) improve phase decisions and overall system performance through richer state representation and reward shaping.

CoLight is a term associated with several technically distinct frameworks and theories, all sharing the characteristic of cooperative or coordinated processing of information. In reinforcement learning (RL) and intelligent transportation systems, CoLight refers to a family of multi-agent RL algorithms for network-level traffic signal control using graph attention networks. In computational imaging and vision, MV-CoLight designates a framework for consistent lighting and shadow generation in multi-view object compositing. Additionally, in quantum optics, “CoLight” encompasses the theoretical study of cooperative light scattering across spatial dimensions. This article systematically presents the RL-based CoLight family, related extensions, and briefly situates the term in other technical domains.

1. Fundamental Principles of CoLight in Traffic Signal Control

CoLight formulates traffic signal control as a decentralized, multi-agent RL problem on urban road networks (Wei et al., 2019). Each intersection is modeled as an independent agent observing local states and selecting among discrete signal phases at each decision epoch. The agent’s local observation comprises a one-hot encoding of the current signal phase and the set of queue lengths on its incoming lanes. The global objective is to minimize aggregate vehicle travel time, with local reward defined as the negative sum of queue lengths on the controlled intersection’s inbound lanes.

A central innovation is the use of a Graph Attention Network (GAT) to facilitate learned, index-free communication among neighboring intersections. Agents are arranged as nodes in a spatial graph, where message passing among nodes occurs through dynamically weighted, multi-head aggregation. This structure enables each signal controller to adaptively prioritize information from important spatial (e.g., upstream vs. downstream) and temporal (e.g., peak flow direction) neighbors. All nodes share underlying parameterizations to ensure scalability to city-sized networks.

The RL optimization proceeds via a standard Deep Q-Network (DQN) loss, with task-specific experience replay, target networks, and $\epsilon$ -greedy exploration.

2. Architecture and Communication Mechanism

CoLight’s core is a multi-layer, multi-head GAT. Each layer processes node embeddings as follows:

Input features per intersection: local signal phase (one-hot) and current queue lengths.
Embedding layer: local MLP transforms the observation vector.
Attention layers: Each agent computes compatibility scores (via learned projections) between its own embedding and those of neighbors. Attention weights are normalized by softmax. The neighborhood is usually defined via geodesic proximity or graph hops and typically includes 3–5 nearest neighbors for optimal performance.
Multi-head mechanism: Multiple independent projections capture diverse aspects of spatial and temporal influence. Outputs from each head are aggregated (averaged or concatenated).
The final layer’s embeddings are passed to a Q-value head, outputting one Q-value per feasible signal phase.

This design ensures index-free, dynamically learned neighbor weighting, overcoming limitations of concatenation-based or hand-indexed neighbor aggregation methods. Communication is both spatially and temporally adaptive: e.g., arterial intersections are prioritized over side streets, and directions with transiently high flow receive disproportionate attention.

Global cooperation emerges as each intersection’s policy not only reflects its local state but also incorporates complex dependencies encoded by upstream and downstream patterns (Wei et al., 2019, Hassanjani et al., 2023).

3. Traffic State Representation: From Efficient Pressure to Advanced-CoLight

Originally, CoLight utilized the “queue length” as its basic state descriptor. In subsequent work, the “Advanced-CoLight” variant introduced the “Advanced Traffic State” (ATS) (Zhang et al., 2021). For each movement $(l,m)$ , ATS comprises:

Efficient (queuing) pressure:

$e(l,m) = \frac{1}{M} \sum_{i=1}^{M} q(l'_i) - \frac{1}{N} \sum_{j=1}^{N} q(m'_j)$

where $l'_i$ and $m'_j$ are incoming/outgoing lanes, and $q(\cdot)$ is queue length.

Effective (running-vehicle) demand:

$r(l,m) = \sum_{l' \in l} r_e(l'), \qquad r_e(l') = |\{\text{vehicles on } l' \text{ within } L\}|$

$L = V_{\max} \times t_{\mathrm{duration}}$ is the effective range a free-flowing vehicle can cover in a single minimum phase interval.

ATS for each movement is then $(e(l,m), r(l,m))$ . This richer state representation enables the RL agent to anticipate not only the queued vehicle pressure but also imminent demand from vehicles approaching the intersection, resulting in smoother, more efficient phase decisions and fewer premature phase changes (which can incur significant clearance time penalties).

The only architectural alteration in Advanced-CoLight, compared to CoLight, is the replacement of node-level features with ATS and the current phase. Reward definitions and RL update schemes remain otherwise identical (Zhang et al., 2021).

4. Empirical Performance and Comparative Results

CoLight demonstrates significant improvements in city-scale benchmark networks. Across synthetic and real-world simulations (e.g., Manhattan, Hangzhou, Jinan), CoLight yields 7–12% reductions in average travel time over previous RL or model-based baselines, and up to ≈20% over classical max-pressure policies (Wei et al., 2019, Hassanjani et al., 2023). Advanced-CoLight yields further reductions (–9% to –10%) relative to standard CoLight, for example:

JiNan-1: CoLight: 272.06 s; Advanced-CoLight: 245.73 s (–9.7%)
HangZhou-1: CoLight: 297.02 s; Advanced-CoLight: 270.45 s (–9.0%)
New York: CoLight: 1065.64 s; Advanced-CoLight: 970.05 s (–8.9%)

Advanced-CoLight also outperforms Efficient-CoLight by up to 6.0% in travel time reduction (Zhang et al., 2021). The performance gain arises from better anticipation of vehicle platoons and avoidance of unnecessary phase changes, especially under fluctuating traffic patterns.

Ablations show optimal neighbor count is 3–5 for effective scalability; more heads improve performance up to H=5 before plateauing. Geo-distance-based neighbor definition slightly outperforms hop-distance on real networks.

5. Extensions and Reward Shaping

The “IACoLight” extension introduces inequity aversion (IA) into the reward structure: agents compare their running average extrinsic reward to others and receive additional shaped rewards punishing or rewarding relative performance gaps. Formally, the IA term is a linear combination of disadvantageous inequity (envy), advantageous inequity (pride/guilt), with tunable coefficients. Allowing the coefficient for advantageous inequity to be negative (rewarding “pride”) leads to superior global throughput and up to 11.4% reduction in travel time on real networks (Hassanjani et al., 2023). This demonstrates that fairness-inspired reward shaping can break local minima, accelerate convergence, and yield better overall city-level traffic flow.

6. Computational Scalability and Practical Considerations

CoLight is highly scalable. Training time per episode grows linearly with the number of intersections and is comparable to simpler parameter-sharing or graph convolutional approaches. Its architecture supports deployment to networks with hundreds of signals. The critical architectural choices for stability and learning rate include careful tuning of neighborhood size, number of attention heads, and the duration of fixed phases. The framework requires only local queue-length sensor data or, in Advanced-CoLight, short-range vehicle detection for ATS calculation.

While this article has focused on RL-based traffic signal control, “CoLight” is used in other specialized areas:

MV-CoLight is a feed-forward two-stage pipeline for efficient object compositing with consistent lighting and shadow generation in multi-view computer vision and AR (Ren et al., 27 May 2025). It employs Hilbert curve–based mappings to fuse 2D image features and 3D Gaussian scene geometry, outperforming baselines on both 2D/3D harmonization metrics and supporting large-scale dataset training.
CoLight theory in quantum optics refers to a general framework for cooperative light scattering in arbitrary spatial dimension. The theory unifies the treatment of super- and subradiance, collective Lamb shifts, and cooperative extinction, with scaling laws and explicit formulae for Green’s functions, level shifts, and decay rates across 1D, 2D, and 3D regimes (Hill et al., 2016).

Summary Table: Core Variants of CoLight (Traffic RL Domain)

Variant	State Representation	Network Architecture	Reward Shaping	Performance Benefit
CoLight	Queue lengths, phase	Multi-layer GAT + DQN	Local queue-length sum	7–12% over RL baselines, ≈20% over Max-Pressure
Advanced-CoLight	ATS (pressure + running demand)	As above	As above	9–10% over CoLight, up to 6% over Efficient-CoLight
IACoLight	As CoLight	As CoLight	Inequity aversion (IA)	Up to 11.4% over CoLight (with pride reward)

The “CoLight” frameworks have established state-of-the-art performance in coordinating large-scale networked control (notably, urban traffic signals), driven by innovations in graph-based communication, richer local state encoding, and principled reward shaping. The term also appears as a technical label in computer vision compositing and theoretical photonics, with coherence and cooperation as a unifying conceptual thread.