KST-GCN for Traffic Forecasting

Updated 12 April 2026

The paper introduces a novel KST-GCN architecture that integrates external knowledge graphs with spatiotemporal traffic features to improve prediction accuracy.
It leverages a dedicated Knowledge Fusion Cell and KR-EAR model to embed heterogeneous data such as weather conditions and POI counts into traffic forecasting.
Empirical results show enhanced robustness and reduced RMSE across multiple prediction horizons, highlighting the framework's practical benefits.

KST-GCN (Knowledge-Driven Spatial-Temporal Graph Convolutional Network) is a knowledge representation-augmented deep learning architecture designed for traffic forecasting. The approach explicitly incorporates heterogeneous external knowledge—such as weather and points of interest (POIs)—via a City Knowledge Graph (CKG) and fuses these knowledge representations with spatiotemporal traffic features through a Knowledge Fusion Cell (KF-Cell) atop a spatial-temporal GCN-RNN backbone. KST-GCN is the first framework reported to integrate knowledge graphs into traffic forecasting, demonstrating improved accuracy and robustness across multiple prediction horizons on real-world traffic datasets (Zhu et al., 2020).

1. Architectural Overview

The KST-GCN prediction process is formulated as

$\hat Y = f(A, X, CKG)$

where $A \in \{0,1\}^{n \times n}$ is the adjacency matrix of the road network, $X \in \mathbb{R}^{n \times d_x}$ is the historical traffic-feature tensor (e.g., vehicle speeds), and $CKG$ is the City Knowledge Graph encoding external knowledge. The framework comprises three primary components: (a) knowledge graph construction tailored to the urban traffic domain; (b) learning knowledge representations via the KR-EAR (Knowledge Representation–Entity-And-Relation) model; and (c) a Knowledge Fusion Cell (KF-Cell) that integrates knowledge embeddings and traffic features before input to a spatial-temporal GCN-RNN backbone.

2. Traffic Knowledge Graph Construction

The CKG encodes heterogeneous entities and relations relevant for urban traffic:

Entities ( $v_i$ ): each road section in the network.
Relations ( $r$ $r$ ):
- adj: the adjacency (topology) between road segments.
- att_l: attribute relations describing external static or dynamic factors (e.g., weather condition, POI counts).
- Attribute–attribute co-occurrence: correlations between different types of contextual attributes.
Knowledge triples:

Road adjacency: $R = \{(v_i, \mathrm{adj}, v_j) \mid a_{ij}=1\}$ .
Road-attribute: $R_{att} = \{(v_i, att_\ell, att_{\ell\text{-}v_i})\}$ , where $att_{\ell\text{-}v_i}$ might indicate, for example, the number of restaurants along a road segment.
Attribute–attribute co-occurrence: $\mathrm{att\_att} = \{(att_{\ell_1}, att_{\ell_2}, p)\}$ , where $A \in \{0,1\}^{n \times n}$ 0 is the empirical co-occurrence probability of two attributes.

External Factors Encoded: static (e.g., POI counts), dynamic (e.g., weather states, time intervals).

The knowledge graph thus captures both the explicit infrastructure topology and multifaceted contextual influences on traffic, facilitating their joint modeling.

3. KR-EAR Knowledge Representation Learning

KR-EAR provides a mechanism to simultaneously embed entities, relations, and attribute values into a unified vector space, distinguishing semantic roles:

Entity embedding: $A \in \{0,1\}^{n \times n}$ 1.
Relation embedding: $A \in \{0,1\}^{n \times n}$ 2.
Attribute-value embedding: $A \in \{0,1\}^{n \times n}$ 3.
The objective maximizes the likelihood:

$A \in \{0,1\}^{n \times n}$ 4

where

$A \in \{0,1\}^{n \times n}$ 5

$A \in \{0,1\}^{n \times n}$ 6

Relation triple modeling uses a TransR-like projection:

$A \in \{0,1\}^{n \times n}$ 7

with

$A \in \{0,1\}^{n \times n}$ 8

( $A \in \{0,1\}^{n \times n}$ 9: relation-specific projection, $X \in \mathbb{R}^{n \times d_x}$ 0: bias).

Attribute triple modeling:

$X \in \mathbb{R}^{n \times d_x}$ 1

$X \in \mathbb{R}^{n \times d_x}$ 2

(with learned parameters $X \in \mathbb{R}^{n \times d_x}$ 3, $X \in \mathbb{R}^{n \times d_x}$ 4, nonlinearity $X \in \mathbb{R}^{n \times d_x}$ 5).

Stochastic gradient descent over negative log-likelihood is used for learning, with embedding dimension $X \in \mathbb{R}^{n \times d_x}$ 6 in experiments.

4. Knowledge Fusion Cell and Spatiotemporal Modeling Backbone

The KF-Cell integrates traffic and knowledge representations per node and time:

Let $X \in \mathbb{R}^{n \times d_x}$ 7 be traffic features at time $X \in \mathbb{R}^{n \times d_x}$ 8, $X \in \mathbb{R}^{n \times d_x}$ 9 static factor embedding (e.g., POI), $CKG$ 0 dynamic factor embedding (e.g., weather).
The fusion is achieved through gated elementwise products followed by concatenation:

$CKG$ 1

( $CKG$ 2: elementwise product; $CKG$ 3, $CKG$ 4: trainable).

$CKG$ 5 is then provided, along with the graph structure $CKG$ 6, to a spatial-temporal GCN-RNN backbone. The GCN employs first-order spectral approximation: $CKG$ 7

where $CKG$ 8, $CKG$ 9 its degree, $v_i$ 0 an activation.

Temporal dynamics are modeled by a GRU: $v_i$ 1 Prediction is computed as $v_i$ 2.

5. Training Protocol and Hyperparameterization

The full network minimizes an end-to-end objective: $v_i$ 3 with $v_i$ 4 being ground-truth speed vectors and $v_i$ 5 regularization weight $v_i$ 6.

Optimizer: Adam, learning rate $v_i$ 7.
Batch size: 64.
Embedding dimension: $v_i$ 8.
Hidden units: 128 (KF-T-GCN), 64 (KF-DCRNN).
Dataset split: 80% train, 20% test; validation within train.
Epochs: approximately 50, with early stopping based on validation RMSE.

6. Empirical Results and Analysis

Experiments are conducted on Shenzhen taxi data (January 2015, 156 road segments), using metrics including RMSE, MAE, Accuracy, and $v_i$ 9, evaluated across prediction horizons (15/30/45/60 min).

Backbone	RMSE (15 min)	MAE (15 min)	RMSE Improv.
DCRNN	4.1243	2.7514	baseline
KF-DCRNN	4.0635	2.7206	–1.47%
T-GCN	4.0696	2.7460	baseline
KF-T-GCN	4.0443	2.7090	–0.63%

Improvements increase with prediction horizon (up to 2.85% drop in RMSE at 60 min for KF-DCRNN, 4.36% for KF-T-GCN).
Ablation studies: integrating only weather or only POI yields 0.3–1% RMSE gain each; full KG fusion achieves the best results.
Noise robustness: Injecting Gaussian or Poisson noise ( $r$ 0, $r$ 1) increases RMSE by <5%, indicating resilience.

7. Limitations, Scalability, and Future Directions

The current CKG implementation covers only POI and weather features over a one-month interval. Extending to incorporate richer external sources (e.g., events, holidays, real-time incidents) is expected to further enhance accuracy.

Constructing and embedding large-scale KGs with KR-EAR is computationally expensive; scalable or localized updates (incremental learning) are likely needed for broader deployments. Practical application in intelligent transportation systems (ITS) requires real-time updates to the CKG (e.g., live weather, incidents) and optimization of the KF-Cell for inference efficiency.

Potential avenues for future work include multi-city transfer learning with KG alignment, attention-based fusion mechanisms in the KF-Cell, and development of online continual learning strategies (Zhu et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to KST-GCN for Traffic.