Sheaf Diffusion with Adaptive Local Structure for Spatio-Temporal Forecasting

Published 13 Apr 2026 in cs.LG | (2604.11275v1)

Abstract: Spatio-temporal systems often exhibit highly heterogeneous and non-intuitive responses to localized disruptions, limiting the effectiveness of conventional message passing approaches in modeling higher-order interactions under local heterogeneity. This paper reformulates spatio-temporal forecasting as the problem of learning information flow over locally structured spaces, rather than propagating globally aligned node representations. We introduce a spatio-temporal sheaf diffusion graph neural network (ST-Sheaf GNN) that embeds graph topology into sheaf-theoretic vector spaces connected by learned linear restriction maps. Unlike prior work that relies on static or globally shared transformations, our model learns dynamic restriction maps that evolve over time and adapt to local spatio-temporal patterns to enable substantially more expressive interactions. By explicitly modeling latent local structure, the proposed framework efficiently mitigates the oversmoothing phenomenon in deep GNN architectures. We evaluate our framework on a diverse set of real-world spatio-temporal forecasting benchmarks spanning multiple domains. Experimental results demonstrate state-of-the-art performance, highlighting the effectiveness of sheaf-theoretic topological representations as a powerful foundation for spatio-temporal graph learning. The code is available at: https://anonymous.4open.science/r/ST-SheafGNN-6523/.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces ST-Sheaf GNN which employs dynamic sheaf diffusion and adaptive restriction maps to overcome oversmoothing in deep GNNs.
It leverages cellular sheaf theory to create locally heterogeneous, time-evolving representations for improved spatio-temporal forecasting across diverse benchmarks.
The model achieves state-of-the-art accuracy with far fewer parameters and enhanced robustness on datasets like NAVER-Seoul and PEMS08.

Sheaf Diffusion with Adaptive Local Structure for Spatio-Temporal Forecasting: Technical Summary

Overview and Motivation

The paper "Sheaf Diffusion with Adaptive Local Structure for Spatio-Temporal Forecasting" (2604.11275) proposes a novel spatio-temporal graph learning framework addressing fundamental limitations of prior GNN-based forecasting methods. Canonical spatio-temporal GNNs generally assume globally consistent latent representations and static, uniform aggregation schemes, which are inadequate under heterogeneous, context-dependent dynamics observed in real-world systems. This work leverages cellular sheaf theory, embedding graph nodes and edges in learnable, locally heterogeneous vector spaces, and replaces fixed propagation with dynamic, adaptive restriction maps. The resulting architecture—ST-Sheaf GNN—substantially enhances model expressivity and mitigates oversmoothing in deep GNNs.

Theoretical Framework

Sheaf-Theoretic Graph Representation

A sheaf provides a principled way to generalize message passing on graphs by attaching vector spaces (“stalks”) to both nodes and edges, along with learnable restriction maps linking them. Instead of enforcing that neighbors maintain similar representations in a globally shared space, information is projected into locally constructed, edge-specific coordinate systems. The restriction maps are parameterized as diagonal, learnable linear operators and further adapted per edge through signal conditioning—enabling localized, time-evolving aggregation rules.

The energy function minimized in classical GNNs (which leads to oversmoothing) is replaced by a sheaf-based alignment term:

$E_\mathcal{F}(h) = \sum_{e=(u,v)\in \mathcal{E}} \| \rho_{e,u} h_u - \rho_{e,v} h_v \|^2$

where $\rho_{e,u}$ and $\rho_{e,v}$ are restriction maps for edge $e$ . This design allows differentiation of neighboring node features even with multiple diffusion steps.

Sheaf Laplacian and Diffusion

The model defines a generalized sheaf Laplacian, extending classical graph diffusion to heterogeneous, higher-order signal propagation:

$(\mathcal{L}_\mathcal{F} h)_u = \sum_{e : u \in e} \rho_{e,u} ( \rho_{e,u} h_u - \rho_{e,v} h_v )$

where summation covers all edges incident to node $u$ . The sheaf diffusion process occurs in this latent space, supporting multi-dimensional, context-aware dynamics inaccessible to standard GNNs.

Dynamic Restriction Maps

Crucially, restriction maps are dynamically modulated at each timestep by MLPs conditioned on edge incident node representations, yielding temporally evolving local structures. This design aligns information propagation patterns with input data characteristics, accommodating time-varying and asymmetric dependencies inherent to real-world scenarios (e.g., abrupt traffic changes).

Architecture Description

The ST-Sheaf GNN consists of:

Temporal Encoder: Multi-head self-attention per node over the input time series followed by position-wise feed-forward networks and projection into sheaf stalk latent spaces.
Sheaf Diffusion Layers: Multiple layers where node features are projected into dynamic edge-local coordinate systems via restriction maps, diffused using the sheaf Laplacian, and aggregated via learnable gating mechanisms for stable, selective propagation.
Decoder: Linear projection of the final representations to forecast the desired horizons.

Degree normalization is included in aggregation to avoid high-degree node dominance.

Empirical Evaluation

Benchmarks and Experimental Protocol

The framework is evaluated on six real-world benchmarks (METR-LA, PEMS04, PEMS08, NAVER-Seoul, Molene, AirQuality), covering traffic, environmental, and weather domains. Comparison includes recent SOTA methods: ARIMA, STGCN, DCRNN, GW-Net, GMAN, ASTGCN, SGP, CITRUS, and STDN.

Performance is assessed by mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percentage error (MAPE). The protocol follows established dataset splits and multi-horizon prediction settings.

Key Results

State-of-the-art accuracy: ST-Sheaf achieves superior or near-optimal MAE across all datasets and horizons. For example, on NAVER-Seoul, MAE for the 60-min horizon is 5.11 (vs. 5.94 for the best baseline).
Robustness in long-range forecasting: Error accumulation for longer horizons is significantly reduced relative to all baselines, indicating superior modeling of long-term cascade effects.
Oversmoothing mitigation: Empirical analysis shows that, unlike classical GCNs where node representations quickly collapse, ST-Sheaf maintains substantial inter-node discriminability across many diffusion layers, supporting deeper architectures.
Parameter efficiency: The model achieves these improvements with $>100\times$ fewer parameters than STDN (e.g., 40–59k vs. 6M+ parameters), and with 3–5x shorter training times per epoch.
Ablation: Dynamic, signal-conditioned restriction maps and sheaf-based diffusion account for the majority of performance gains; static restriction maps, removal of sheaf structure, or temporal encoding all lead to observable degradation.

Practical and Theoretical Implications

Practical Impact

ST-Sheaf provides a scalable, parameter-efficient, and accurate approach for spatio-temporal forecasting in settings characterized by complex, heterogeneous local couplings—such as urban traffic management, air quality monitoring, and weather prediction. The architecture’s ability to capture nontrivial, time-varying interactions positions it as a strong candidate for real-time decision-support in large infrastructure and environmental systems.

Theoretical Contributions

By operationalizing cellular sheaf theory in deep learning, this framework establishes a robust connection between algebraic topology and neural modeling of complex systems. The learnable, dynamic restriction map design extends sheaf-based GNNs to practical, real-world forecasting, bridging the gap between theoretical exploration and high-impact application. This work demonstrates that sheaf theory addresses oversmoothing and heterogeneity trade-offs endemic to message passing GNNs, suggesting new directions for designing deep architectures on graphs, hypergraphs, and higher-order relational data.

Future Directions

Potential research avenues include extension to multimodal graph structures, integration of multirelational sheaves for richly attributed and edge-typed graphs, and further reduction of computational complexity for extreme-scale spatio-temporal learning. The sheaf-theoretic perspective may also inform principled approaches to uncertainty quantification and causal reasoning in GNNs.

Conclusion

ST-Sheaf GNN redefines spatio-temporal forecasting on graphs by utilizing dynamic, sheaf-based local structures and adaptive restriction maps for information diffusion. The empirical results validate its expressive power, efficiency, robustness against oversmoothing, and suitability for demanding, heterogeneous spatio-temporal settings. This work positions sheaf theory as a foundational tool for advancing deep learning architectures tailored to complex, real-world systems, and catalyzes further investigation at the intersection of topology and neural computation.

Markdown Report Issue