Lite-STGNN: Lightweight Spatial-Temporal GNN
- The paper introduces Lite-STGNN which decomposes input time series into trend and seasonal components for robust baseline forecasting.
- It employs a learnable sparse graph structure with low-rank factorization and top-K sparsification to enable effective inter-variable message passing.
- The model incorporates a conservative horizon-wise gating mechanism, achieving state-of-the-art performance with low parameter count and computational efficiency.
Lite-STGNN is a lightweight spatial-temporal graph neural network (STGNN) that integrates decomposition-based temporal modeling with a learnable sparse graph structure for long-term multivariate time series forecasting. It decomposes each input series into trend and seasonal components, employs a parameter-efficient temporal backbone, introduces a sparsified low-rank learnable adjacency matrix for inter-variable message passing, and incorporates a conservative horizon-wise gating mechanism to modulate spatial corrections. Lite-STGNN achieves state-of-the-art performance across diverse forecasting benchmarks while maintaining low parameter counts and high computational efficiency (Moges et al., 19 Dec 2025).
1. Problem Formulation
Given a multivariate time series —where is the input sequence length and is the number of variables or nodes—the objective is to predict the ensuing -step sequence . The mapping is trained to minimize mean squared error (MSE):
Each variable is treated as a node in a latent, learned spatial graph determined by a sparse adjacency matrix .
2. Temporal Module: Trend–Seasonal Decomposition
The temporal modeling backbone employs a trend-seasonal decomposition. For each univariate time series :
- The trend is estimated using a centered moving average:
- The full sequence is split into trend and seasonal matrices, .
- Independent linear projections forecast each component:
with .
- The temporal baseline for the forecast is then:
This baseline possesses complexity and demonstrates robust long-horizon stability.
3. Spatial Module: Learnable Sparse Graph Structure
To model cross-variable dependencies, Lite-STGNN learns low-rank source and destination factor matrices, , with . The dense score matrix is
Sparse adjacency is enforced by row-wise Top- selection:
$A_{ij} = \begin{cases} S_{ij} & \text{if } S_{ij} \text{ is among the largest $ki$} \ 0 & \text{otherwise} \end{cases}$
or concisely,
Following row-normalization with , message passing is conducted as a residual:
where is the identity, eliminating self-loops so that reflects only cross-variable effects.
4. Conservative Horizon-Wise Gating Mechanism
Spatial corrections are modulated via a learnable, per-horizon gating vector . The gating is defined as
with:
- the sigmoid,
- initialized to for all forecast steps ,
- a learned scaling factor,
- element-wise multiplication, broadcast over nodes.
This horizon-indexed gating lets the model suppress spatial corrections at longer horizons, mitigating over-reliance on noisy inter-variable messages for distant forecasts.
5. Model Integration, Training, and Efficiency
The total forecast is
where the temporal backbone dominates signal capture and the GNN module provides sparse, horizon-adaptive corrections.
- Loss: Mean squared error is minimized across all outputs.
- Optimizer: Adam, with early stopping on validation set (patience 10) using MSE averaged across multiple horizons ().
- Parameters: Temporal (0.14M), spatial (; 0.60M) for a total of 0.74M, which is 174 fewer than ModernTCN (129M).
- Complexity: Per-forward pass .
- Empirical runtime on Electricity: $27.3$ s/epoch for Lite-STGNN vs  s/epoch for ModernTCN given .
6. Experimental Validation
Extensive evaluation is performed on four benchmarks:
| Dataset | Nodes () | Horizon () | Lite-STGNN MAE/MSE | Best Baseline MAE/MSE |
|---|---|---|---|---|
| Electricity | 321 | 96,192,336,720 | 0.280 / 0.178 | ModernTCN: 0.284 / 0.194 |
| Exchange | 8 | 96,192,336,720 | 0.369 / 0.282 | DLinear H720: 0.594 / 0.808 |
| Traffic | 862 | 96,192,336,720 | 0.328 / 0.552 | PatchTST: (lower MAE/MSE) |
| Weather | 21 | 96,192,336,720 | 0.294 / 0.239 | PatchTST: lower MAE |
Detailed results show that Lite-STGNN achieves the lowest errors for Electricity, Exchange, Weather (MSE), and is second to PatchTST on Traffic, with errors increasing more gradually as the forecast horizon extends up to 720 steps.
7. Ablation, Sensitivity, and Interpretability
Ablation studies provide insight into architectural choices and their quantitative effects:
- Spatial module addition: Including the spatial module on top of a DLinear baseline improves MAE by 4.6% ().
- Top- sparsity: Enforcing (3% density for ) on a rank-16 adjacency yields a further 3.3% gain relative to a dense low-rank adjacency.
- Hyperparameters: produce the best accuracy-efficiency trade-off; two-hop message propagation and moderate dropout further improve stability.
- Learned adjacency: Visualizations reveal interpretable patterns: regional clusters in Electricity, major currency linkages for Exchange, corridor-structured edges in Traffic, and proximity-consistent structures for Weather, aligning with physical or domain knowledge.
- Forecast fidelity: Visualizations for 720-step horizons show accurate tracking of both short-term and long-term dynamics.
A plausible implication is that the learned sparse graphs not only improve forecasting but can be used to infer substantive inter-variable dependencies reflecting true system structure.
Lite-STGNN demonstrates that the synergy of trend-seasonal decomposition, sparsified low-rank spatial corrections, and adaptive horizon-wise gating yields strong, stable, interpretable long-range forecasts at a minimal computational and parameter cost (Moges et al., 19 Dec 2025).