- The paper introduces two fine-grained urban traffic datasets (city-traffic-M and city-traffic-L) featuring detailed road attributes and high-resolution temporal data.
- The paper proposes a scalable time-then-graph GNN approach that significantly reduces memory complexity while efficiently handling long historical lookback windows.
- The paper demonstrates that the GNN-TrfAttn model outperforms baselines with lower mean absolute error, ensuring fast training and practical use in dense urban settings.
Fine-Grained Urban Traffic Forecasting on Metropolis-Scale Road Networks
Introduction and Motivation
This paper addresses the limitations of existing traffic forecasting benchmarks and methods by introducing two large-scale, fine-grained urban traffic datasets and proposing a scalable graph neural network (GNN) approach for traffic prediction. The motivation stems from the inadequacy of current datasets, which are small, lack real road connectivity, and focus on sparsely instrumented intercity highways rather than dense urban networks. The new datasets, city-traffic-M and city-traffic-L, provide comprehensive coverage of urban road segments, rich static attributes, and high-resolution temporal data for both traffic speed and volume.
Dataset Construction and Properties
The city-traffic-M and city-traffic-L datasets are constructed from GPS traces collected by a major navigation service, covering all road segments within a 15 km radius of two large cities. Nodes represent individual road segments, and edges encode actual traffic-permitted connectivity, not heuristic proximity. Each node is annotated with 26 static attributes, including speed limits, road surface quality, and accessibility masks. Temporal data is provided at 5-minute intervals, spanning four months, with both traffic speed and volume as targets.

Figure 1: Visualization of the city-traffic-M road network, showing dense urban connectivity and complex topology.
Figure 2: The distribution of key spatial features (e.g., pavement, crosswalks, truck restrictions) in the proposed datasets.
The datasets are orders of magnitude larger than previous benchmarks (up to 100,000 nodes), and their topological and attribute diversity enables the paper of complex urban traffic phenomena. The weekly dynamics of traffic variables reveal pronounced rush-hour patterns and significant heterogeneity across road types and locations.
Figure 3: Weekly dynamics of traffic speed and volume averaged across all road segments, illustrating daily and holiday patterns.
Figure 4: Histograms of traffic volume and speed, highlighting inter-city differences and multimodal distributions.
Limitations of Existing Benchmarks
Prior datasets such as METR-LA and PEMS-BAY are limited to a few hundred nodes, lack real connectivity, and focus on highways with sparse sensor coverage. Edges are constructed heuristically, and static road features are largely absent. These constraints have led to the development of resource-intensive models that do not scale to realistic urban settings and cannot exploit rich spatial or attribute information.
Model Architectures and Scalability Analysis
The paper benchmarks several established spatiotemporal GNN models: DCRNN, GRUGCN, STGCN, and GWN. These models process temporal data via recurrence or convolution, maintaining a separate representation for each node and timestamp. This results in memory and computational complexity of O(ntd) per layer, which is prohibitive for large n and t.
To address scalability, the authors propose a time-then-graph approach: each node's historical time series is encoded into a single vector via a linear layer, followed by multilayer GNN aggregation. This reduces per-layer memory to O(nd) and enables efficient training and inference on large graphs and long lookback windows.
The GNN component is instantiated with either mean aggregation or local multihead attention (GNN-Mean, GNN-TrfAttn), augmented with skip connections, layer normalization, and MLP blocks.
Experimental Results
On both city-traffic-M and city-traffic-L, the proposed GNN-TrfAttn model achieves the lowest mean absolute error (MAE) for both traffic volume and speed, outperforming all baselines and prior spatiotemporal models. Notably, STGCN fails to train on the largest dataset due to out-of-memory errors, and other models exhibit significant slowdowns as the lookback window increases.
Lookback Window Analysis
Longer lookback windows consistently improve forecasting accuracy for the proposed model, with negligible impact on training time. This demonstrates the efficiency of the time-then-graph paradigm and its suitability for real-world deployment where long historical context is beneficial.
Scalability
Training time for GNN-Mean and GNN-TrfAttn remains low and nearly constant as the lookback window increases, while sequential models (DCRNN, GWN, STGCN) scale poorly. For example, STGCN fails to complete training within 250 hours on city-traffic-L with a lookback of 48, whereas GNN-TrfAttn completes in under 2 hours.
Implications and Future Directions
The introduction of city-traffic-M and city-traffic-L sets a new standard for urban traffic forecasting benchmarks, enabling rigorous evaluation of models in realistic, large-scale settings. The demonstrated scalability and accuracy of the time-then-graph GNN approach suggest that future research should prioritize efficient temporal encoding and flexible spatial aggregation mechanisms.
The datasets' rich attribute space opens opportunities for multitask learning, attribute-aware modeling, and integration with adaptive traffic control, logistics, and urban planning systems. The observed strong dependence of traffic variables on static road features (e.g., speed limits, pavement, crosswalks) underscores the necessity of attribute exploitation in model design.

Figure 5: Visualization of city-traffic-M, highlighting the dense and heterogeneous urban road network.
Conclusion
This work provides a significant advance in urban traffic forecasting by releasing large-scale, fine-grained datasets and proposing a scalable GNN-based model that outperforms existing baselines in both accuracy and efficiency. The findings highlight the limitations of current spatiotemporal architectures and benchmarks, and the necessity of scalable, attribute-aware models for real-world deployment. Future research should build on these datasets and modeling insights to develop holistic, efficient, and interpretable traffic forecasting systems for smart cities.