Lite-STGNN: Lightweight Spatial-Temporal GNN

Updated 26 December 2025

The paper introduces Lite-STGNN which decomposes input time series into trend and seasonal components for robust baseline forecasting.
It employs a learnable sparse graph structure with low-rank factorization and top-K sparsification to enable effective inter-variable message passing.
The model incorporates a conservative horizon-wise gating mechanism, achieving state-of-the-art performance with low parameter count and computational efficiency.

Lite-STGNN is a lightweight spatial-temporal graph neural network (STGNN) that integrates decomposition-based temporal modeling with a learnable sparse graph structure for long-term multivariate time series forecasting. It decomposes each input series into trend and seasonal components, employs a parameter-efficient temporal backbone, introduces a sparsified low-rank learnable adjacency matrix for inter-variable message passing, and incorporates a conservative horizon-wise gating mechanism to modulate spatial corrections. Lite-STGNN achieves state-of-the-art performance across diverse forecasting benchmarks while maintaining low parameter counts and high computational efficiency (Moges et al., 19 Dec 2025).

1. Problem Formulation

Given a multivariate time series $X\in\mathbb{R}^{T\times N}$ —where $T$ is the input sequence length and $N$ is the number of variables or nodes—the objective is to predict the ensuing $L$ -step sequence $Y\in\mathbb{R}^{L\times N}$ . The mapping $f_\Theta:\mathbb{R}^{T\times N}\rightarrow\mathbb{R}^{L\times N}$ is trained to minimize mean squared error (MSE):

$\hat{Y} = f_\Theta(X),\quad \ell(\Theta) = \frac{1}{LN} \|Y - \hat{Y}\|_2^2$

Each variable is treated as a node in a latent, learned spatial graph determined by a sparse adjacency matrix $A\in\mathbb{R}^{N\times N}$ .

2. Temporal Module: Trend–Seasonal Decomposition

The temporal modeling backbone employs a trend-seasonal decomposition. For each univariate time series $x^{(i)}_{1:T}$ :

The trend is estimated using a centered moving average:

$\text{trend}_t^{(i)} = \frac{1}{k} \sum_{j=-\lfloor k/2\rfloor}^{\lfloor k/2\rfloor} x_{t+j}^{(i)}$

$\text{season}_t^{(i)} = x_t^{(i)} - \text{trend}_t^{(i)}$

The full sequence is split into trend and seasonal matrices, $X^{(\text{trend})}, X^{(\text{season})}\in\mathbb{R}^{T\times N}$ .
Independent linear projections forecast each component:

$Y_{\text{trend}} = W_{\text{trend}} X^{(\text{trend})}, \quad Y_{\text{season}} = W_{\text{season}} X^{(\text{season})}$

with $W_{\text{trend}}, W_{\text{season}}\in\mathbb{R}^{L\times T}$ .

The temporal baseline for the forecast is then:

$Y_{\text{base}} = Y_{\text{trend}} + Y_{\text{season}}$

This baseline possesses $O(NL)$ complexity and demonstrates robust long-horizon stability.

3. Spatial Module: Learnable Sparse Graph Structure

To model cross-variable dependencies, Lite-STGNN learns low-rank source and destination factor matrices, $E_{\text{src}}, E_{\text{dst}}\in\mathbb{R}^{N\times r}$ , with $r\ll N$ . The dense score matrix is

$S = \text{ReLU}(E_{\text{src}} E_{\text{dst}}^\top)$

Sparse adjacency is enforced by row-wise Top- $K$ selection:

$A_{ij} = \begin{cases} S_{ij} & \text{if } S_{ij} \text{ is among the largest $k $in row$ i$} \ 0 & \text{otherwise} \end{cases}$

or concisely,

$A = \text{TopK}(\text{ReLU}(E_{\text{src}} E_{\text{dst}}^\top), k)$

Following row-normalization $\bar{A} = D^{-1}A$ with $D_{ii} = \sum_j A_{ij}$ , message passing is conducted as a residual:

$\Delta Y = (\bar{A} - I) Y_{\text{base}}$

where $I$ is the identity, eliminating self-loops so that $\Delta Y$ reflects only cross-variable effects.

4. Conservative Horizon-Wise Gating Mechanism

Spatial corrections $\Delta Y$ are modulated via a learnable, per-horizon gating vector $w_{\text{gate}}\in\mathbb{R}^L$ . The gating is defined as

$g(\Delta Y) = \beta\, \sigma(w_{\text{gate}}) \odot \Delta Y$

with:

$\sigma(\cdot)$ the sigmoid,
$w_{\text{gate},\ell}$ initialized to $-4.0$ for all forecast steps $\ell=1\ldots L$ $(\sigma(-4)\approx 0.018)$ ,
$\beta=\text{softplus}(\theta_{\text{scale}})>0$ a learned scaling factor,
$\odot$ element-wise multiplication, broadcast over $N$ nodes.

This horizon-indexed gating lets the model suppress spatial corrections at longer horizons, mitigating over-reliance on noisy inter-variable messages for distant forecasts.

5. Model Integration, Training, and Efficiency

The total forecast is

$\hat{Y} = Y_{\text{base}} + g(\Delta Y)$

where the temporal backbone dominates signal capture and the GNN module provides sparse, horizon-adaptive corrections.

Loss: Mean squared error is minimized across all outputs.
Optimizer: Adam, with early stopping on validation set (patience 10) using MSE averaged across multiple horizons ( $L\in\{96,192,336,720\}$ ).
Parameters: Temporal (0.14M), spatial ( $E_{\text{src}},E_{\text{dst}},w_{\text{gate}}$ ; 0.60M) for a total of 0.74M, which is 174 $\times$ fewer than ModernTCN (129M).
Complexity: Per-forward pass $O(Nr + NL)$ .
Empirical runtime on Electricity: $27.3$ s/epoch for Lite-STGNN vs $\approx 545$ s/epoch for ModernTCN given $N=321, T=96, L=720$ .

6. Experimental Validation

Extensive evaluation is performed on four benchmarks:

Dataset	Nodes ( $N$ )	Horizon ( $L$ )	Lite-STGNN MAE/MSE	Best Baseline MAE/MSE
Electricity	321	96,192,336,720	0.280 / 0.178	ModernTCN: 0.284 / 0.194
Exchange	8	96,192,336,720	0.369 / 0.282	DLinear H720: 0.594 / 0.808
Traffic	862	96,192,336,720	0.328 / 0.552	PatchTST: (lower MAE/MSE)
Weather	21	96,192,336,720	0.294 / 0.239	PatchTST: lower MAE

Detailed results show that Lite-STGNN achieves the lowest errors for Electricity, Exchange, Weather (MSE), and is second to PatchTST on Traffic, with errors increasing more gradually as the forecast horizon extends up to 720 steps.

7. Ablation, Sensitivity, and Interpretability

Ablation studies provide insight into architectural choices and their quantitative effects:

Spatial module addition: Including the spatial module on top of a DLinear baseline improves MAE by 4.6% ( $0.2939\to0.2800$ ).
Top- $K$ sparsity: Enforcing $k=10$ ( $\approx$ 3% density for $N=321$ ) on a rank-16 adjacency yields a further 3.3% gain relative to a dense low-rank adjacency.
Hyperparameters: $r=16,\, k=10$ produce the best accuracy-efficiency trade-off; two-hop message propagation and moderate dropout further improve stability.
Learned adjacency: Visualizations reveal interpretable patterns: regional clusters in Electricity, major currency linkages for Exchange, corridor-structured edges in Traffic, and proximity-consistent structures for Weather, aligning with physical or domain knowledge.
Forecast fidelity: Visualizations for 720-step horizons show accurate tracking of both short-term and long-term dynamics.

A plausible implication is that the learned sparse graphs not only improve forecasting but can be used to infer substantive inter-variable dependencies reflecting true system structure.

Lite-STGNN demonstrates that the synergy of trend-seasonal decomposition, sparsified low-rank spatial corrections, and adaptive horizon-wise gating yields strong, stable, interpretable long-range forecasts at a minimal computational and parameter cost (Moges et al., 19 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

A lightweight Spatial-Temporal Graph Neural Network for Long-term Time Series Forecasting (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lite-STGNN.

Lite-STGNN: Lightweight Spatial-Temporal GNN

1. Problem Formulation

2. Temporal Module: Trend–Seasonal Decomposition

3. Spatial Module: Learnable Sparse Graph Structure

4. Conservative Horizon-Wise Gating Mechanism

5. Model Integration, Training, and Efficiency

6. Experimental Validation

7. Ablation, Sensitivity, and Interpretability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Lite-STGNN: Lightweight Spatial-Temporal GNN

1. Problem Formulation

2. Temporal Module: Trend–Seasonal Decomposition

3. Spatial Module: Learnable Sparse Graph Structure

4. Conservative Horizon-Wise Gating Mechanism

5. Model Integration, Training, and Efficiency

6. Experimental Validation

7. Ablation, Sensitivity, and Interpretability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research