TempoKGAT: Temporal Graph Attention Model
- TempoKGAT is a temporal graph attention architecture that integrates time-decay weighted node features with selective top-k neighbor aggregation.
- The model improves prediction accuracy and interpretability by reducing MSE and RMSE by 20–40% over standard GATs in various spatio-temporal forecasting tasks.
- Its design efficiently limits computation by focusing on the most influential neighbors, making it ideal for traffic, epidemiology, and energy forecasting applications.
TempoKGAT is a graph attention network architecture designed to model temporal, dynamic graph data, integrating time-decay weighted node features and a selective top- neighbor attention protocol. Developed to address limitations of conventional graph neural networks (GNNs) in capturing evolving relationships within spatio-temporal datasets, TempoKGAT enables both improved prediction accuracy and enhanced interpretability in temporal forecasting contexts (Sasal et al., 2024).
1. Architectural Components
TempoKGAT consists of a single-layer graph attention framework that interleaves two principal mechanisms: a Temporal Block and a Spatial Block. The Temporal Block applies element-wise exponential decay to node features based on relative timestamps, allowing recent observations to exert stronger influence. Given node features , timestamps , and decay rate , the decayed features are computed as:
where denotes the Hadamard product.
The Spatial Block restricts attention to the top- neighbors by edge weights, selecting for each node :
Attention coefficients are computed using a single-head additive mechanism. For projected decayed features , the attention for node toward neighbor is:
Neighbor contributions are further modulated by edge weights:
This design combines localized temporal weighting and edge-aware attention to represent latent dynamic patterns in temporal graphs.
2. Selective Neighbor Aggregation Protocol
TempoKGAT's neighbor selection restricts each node's receptive field to the largest edge-weight neighbors. Only this subset contributes to the spatial aggregation, substantially reducing computational cost relative to full adjacency attention.
The protocol can be summarized as:
- Selection: Equation (2) (above) finds the top- neighbors by descending edge weight.
- Aggregation: Equations (4)–(7) specify the attention and feature aggregation over only these critical neighbors.
This mechanism allows for focused exploitation of salient graph connections corresponding to the strongest temporal and spatial dependencies, effectively filtering out noise from weak or irrelevant relationships.
3. Objective Function and Optimization
TempoKGAT is optimized for point forecasting/regression tasks using Mean Squared Error (MSE) as the sole loss function:
Performance metrics, reported during evaluation, include:
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE):
- Mean Absolute Error (MAE):
No explicit regularization or temporal smoothing beyond the time-decay factor is employed. Training uses the Adam optimizer (learning rate 0.001) over 200 epochs with an 80/20 temporal split.
4. Computational Complexity Analysis
TempoKGAT's selective neighbor aggregation reduces attention and aggregation cost from in standard GATs to per node, with typically far less than average node degree. Top- selection per node incurs via sorting, or linear time via selection algorithms.
Space requirements increase only modestly, due to storage for decayed feature masks and top- neighbor indices. For graphs where , overall batch runtime is improved compared to full attention mechanisms.
5. Experimental Protocol and Quantitative Results
TempoKGAT evaluation spans five open-source spatio-temporal benchmarks across traffic, epidemiology, and energy domains:
| Dataset | Optimal | MAE | MSE | RMSE |
|---|---|---|---|---|
| PedalMe | 1 | 0.7476 | 1.1717 | 1.0825 |
| ChickenPox | 1 | 0.6489 | 1.0017 | 1.0008 |
| England Covid | 5 | 0.4953 | 0.4192 | 0.6474 |
| Small Windmill | 7 | 0.7949 | 0.9821 | 0.9910 |
| Medium Windmill | 17 | 0.7198 | 0.8890 | 0.9429 |
Comparative baselines include GRU, LSTM (graph extensions), GCN, GAT, TGCN, DCRNN, EvolveGCNH, and naive edge-weight-injected versions. TempoKGAT consistently outperforms these methods, achieving reductions of 20–40% in MSE and RMSE over standard GATs. Optimal varies: small values suffice for dense graphs; large values are preferable for sparse or high-variance settings.
6. Interpretative Insights and Limitations
The joint role of learned attention scores and time-decay factors enables granular interpretability, highlighting nodes and historical lags most influential in predictions—facilitating qualitative causal analysis. Empirically, the sufficiency of in several datasets implies that a single dominant neighbor, modulated by time-weighting, often provides the main predictive signal.
Identified limitations include increased computational overhead with large and potential under-capture of complex, multi-relational graph dynamics due to single-head attention. Prospective improvements involve faster top- algorithms, multi-head attention, scaling to orders-of-magnitude larger graphs, and adaptive decay mechanisms.
7. Context, Applications, and Prospects
TempoKGAT is situated at the intersection of temporal GNNs and local attention-based aggregation strategies, addressing challenges in temporal graph analysis across diverse domains such as traffic prediction, epidemiological modeling, and renewable energy forecasting. Its architecture enables both advanced predictive accuracy and model interpretability without explicit regularization. Future work will likely focus on algorithmic efficiency, richer multi-pattern attention, and generalization to extremely large-scale dynamic graphs (Sasal et al., 2024).