Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lite-STGNN: Lightweight Spatial-Temporal GNN

Updated 26 December 2025
  • The paper introduces Lite-STGNN which decomposes input time series into trend and seasonal components for robust baseline forecasting.
  • It employs a learnable sparse graph structure with low-rank factorization and top-K sparsification to enable effective inter-variable message passing.
  • The model incorporates a conservative horizon-wise gating mechanism, achieving state-of-the-art performance with low parameter count and computational efficiency.

Lite-STGNN is a lightweight spatial-temporal graph neural network (STGNN) that integrates decomposition-based temporal modeling with a learnable sparse graph structure for long-term multivariate time series forecasting. It decomposes each input series into trend and seasonal components, employs a parameter-efficient temporal backbone, introduces a sparsified low-rank learnable adjacency matrix for inter-variable message passing, and incorporates a conservative horizon-wise gating mechanism to modulate spatial corrections. Lite-STGNN achieves state-of-the-art performance across diverse forecasting benchmarks while maintaining low parameter counts and high computational efficiency (Moges et al., 19 Dec 2025).

1. Problem Formulation

Given a multivariate time series X∈RT×NX\in\mathbb{R}^{T\times N}—where TT is the input sequence length and NN is the number of variables or nodes—the objective is to predict the ensuing LL-step sequence Y∈RL×NY\in\mathbb{R}^{L\times N}. The mapping fΘ:RT×N→RL×Nf_\Theta:\mathbb{R}^{T\times N}\rightarrow\mathbb{R}^{L\times N} is trained to minimize mean squared error (MSE):

Y^=fΘ(X),ℓ(Θ)=1LN∥Y−Y^∥22\hat{Y} = f_\Theta(X),\quad \ell(\Theta) = \frac{1}{LN} \|Y - \hat{Y}\|_2^2

Each variable is treated as a node in a latent, learned spatial graph determined by a sparse adjacency matrix A∈RN×NA\in\mathbb{R}^{N\times N}.

2. Temporal Module: Trend–Seasonal Decomposition

The temporal modeling backbone employs a trend-seasonal decomposition. For each univariate time series x1:T(i)x^{(i)}_{1:T}:

  • The trend is estimated using a centered moving average:

trendt(i)=1k∑j=−⌊k/2⌋⌊k/2⌋xt+j(i)\text{trend}_t^{(i)} = \frac{1}{k} \sum_{j=-\lfloor k/2\rfloor}^{\lfloor k/2\rfloor} x_{t+j}^{(i)}

seasont(i)=xt(i)−trendt(i)\text{season}_t^{(i)} = x_t^{(i)} - \text{trend}_t^{(i)}

  • The full sequence is split into trend and seasonal matrices, X(trend),X(season)∈RT×NX^{(\text{trend})}, X^{(\text{season})}\in\mathbb{R}^{T\times N}.
  • Independent linear projections forecast each component:

Ytrend=WtrendX(trend),Yseason=WseasonX(season)Y_{\text{trend}} = W_{\text{trend}} X^{(\text{trend})}, \quad Y_{\text{season}} = W_{\text{season}} X^{(\text{season})}

with Wtrend,Wseason∈RL×TW_{\text{trend}}, W_{\text{season}}\in\mathbb{R}^{L\times T}.

  • The temporal baseline for the forecast is then:

Ybase=Ytrend+YseasonY_{\text{base}} = Y_{\text{trend}} + Y_{\text{season}}

This baseline possesses O(NL)O(NL) complexity and demonstrates robust long-horizon stability.

3. Spatial Module: Learnable Sparse Graph Structure

To model cross-variable dependencies, Lite-STGNN learns low-rank source and destination factor matrices, Esrc,Edst∈RN×rE_{\text{src}}, E_{\text{dst}}\in\mathbb{R}^{N\times r}, with r≪Nr\ll N. The dense score matrix is

S=ReLU(EsrcEdst⊤)S = \text{ReLU}(E_{\text{src}} E_{\text{dst}}^\top)

Sparse adjacency is enforced by row-wise Top-KK selection:

$A_{ij} = \begin{cases} S_{ij} & \text{if } S_{ij} \text{ is among the largest $kinrow in row i$} \ 0 & \text{otherwise} \end{cases}$

or concisely,

A=TopK(ReLU(EsrcEdst⊤),k)A = \text{TopK}(\text{ReLU}(E_{\text{src}} E_{\text{dst}}^\top), k)

Following row-normalization Aˉ=D−1A\bar{A} = D^{-1}A with Dii=∑jAijD_{ii} = \sum_j A_{ij}, message passing is conducted as a residual:

ΔY=(Aˉ−I)Ybase\Delta Y = (\bar{A} - I) Y_{\text{base}}

where II is the identity, eliminating self-loops so that ΔY\Delta Y reflects only cross-variable effects.

4. Conservative Horizon-Wise Gating Mechanism

Spatial corrections ΔY\Delta Y are modulated via a learnable, per-horizon gating vector wgate∈RLw_{\text{gate}}\in\mathbb{R}^L. The gating is defined as

g(ΔY)=β σ(wgate)⊙ΔYg(\Delta Y) = \beta\, \sigma(w_{\text{gate}}) \odot \Delta Y

with:

  • σ(â‹…)\sigma(\cdot) the sigmoid,
  • wgate,â„“w_{\text{gate},\ell} initialized to −4.0-4.0 for all forecast steps â„“=1…L\ell=1\ldots L (σ(−4)≈0.018)(\sigma(-4)\approx 0.018),
  • β=softplus(θscale)>0\beta=\text{softplus}(\theta_{\text{scale}})>0 a learned scaling factor,
  • ⊙\odot element-wise multiplication, broadcast over NN nodes.

This horizon-indexed gating lets the model suppress spatial corrections at longer horizons, mitigating over-reliance on noisy inter-variable messages for distant forecasts.

5. Model Integration, Training, and Efficiency

The total forecast is

Y^=Ybase+g(ΔY)\hat{Y} = Y_{\text{base}} + g(\Delta Y)

where the temporal backbone dominates signal capture and the GNN module provides sparse, horizon-adaptive corrections.

  • Loss: Mean squared error is minimized across all outputs.
  • Optimizer: Adam, with early stopping on validation set (patience 10) using MSE averaged across multiple horizons (L∈{96,192,336,720}L\in\{96,192,336,720\}).
  • Parameters: Temporal (0.14M), spatial (Esrc,Edst,wgateE_{\text{src}},E_{\text{dst}},w_{\text{gate}}; 0.60M) for a total of 0.74M, which is 174×\times fewer than ModernTCN (129M).
  • Complexity: Per-forward pass O(Nr+NL)O(Nr + NL).
  • Empirical runtime on Electricity: $27.3$ s/epoch for Lite-STGNN vs ≈545\approx 545 s/epoch for ModernTCN given N=321,T=96,L=720N=321, T=96, L=720.

6. Experimental Validation

Extensive evaluation is performed on four benchmarks:

Dataset Nodes (NN) Horizon (LL) Lite-STGNN MAE/MSE Best Baseline MAE/MSE
Electricity 321 96,192,336,720 0.280 / 0.178 ModernTCN: 0.284 / 0.194
Exchange 8 96,192,336,720 0.369 / 0.282 DLinear H720: 0.594 / 0.808
Traffic 862 96,192,336,720 0.328 / 0.552 PatchTST: (lower MAE/MSE)
Weather 21 96,192,336,720 0.294 / 0.239 PatchTST: lower MAE

Detailed results show that Lite-STGNN achieves the lowest errors for Electricity, Exchange, Weather (MSE), and is second to PatchTST on Traffic, with errors increasing more gradually as the forecast horizon extends up to 720 steps.

7. Ablation, Sensitivity, and Interpretability

Ablation studies provide insight into architectural choices and their quantitative effects:

  • Spatial module addition: Including the spatial module on top of a DLinear baseline improves MAE by 4.6% (0.2939→0.28000.2939\to0.2800).
  • Top-KK sparsity: Enforcing k=10k=10 (≈\approx3% density for N=321N=321) on a rank-16 adjacency yields a further 3.3% gain relative to a dense low-rank adjacency.
  • Hyperparameters: r=16, k=10r=16,\, k=10 produce the best accuracy-efficiency trade-off; two-hop message propagation and moderate dropout further improve stability.
  • Learned adjacency: Visualizations reveal interpretable patterns: regional clusters in Electricity, major currency linkages for Exchange, corridor-structured edges in Traffic, and proximity-consistent structures for Weather, aligning with physical or domain knowledge.
  • Forecast fidelity: Visualizations for 720-step horizons show accurate tracking of both short-term and long-term dynamics.

A plausible implication is that the learned sparse graphs not only improve forecasting but can be used to infer substantive inter-variable dependencies reflecting true system structure.


Lite-STGNN demonstrates that the synergy of trend-seasonal decomposition, sparsified low-rank spatial corrections, and adaptive horizon-wise gating yields strong, stable, interpretable long-range forecasts at a minimal computational and parameter cost (Moges et al., 19 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lite-STGNN.