Graph State Space Models

Updated 24 October 2025

Graph state space models are advanced frameworks that extend traditional state-space models to arbitrary graph domains, capturing spatial, temporal, and relational dependencies.
They employ techniques like factor graph optimization, EM with sparsity, and neural state-space blocks to enhance inference accuracy, scalability, and interpretability.
These models achieve permutation equivariance, efficient long-range propagation, and dynamic topology adaptation, proving effective in areas from signal processing to computational neuroscience.

Graph state space models constitute a broad class of probabilistic and neural network models that unify state-space principles with graph-structured data, allowing relational, spatial, or temporal dependencies among system variables to be directly encoded or inferred via a graph. These models extend classic state-space methods—traditionally defined on sequences—to arbitrary graph domains, providing frameworks for inference, estimation, learning, and prediction in systems where dependencies are captured by edges in a graph rather than (or in addition to) the standard time-ordered chain.

1. Principles and Formulations of Graph State Space Models

Graph state space models (GSSMs) generalize the classic state-space paradigm by replacing the simple chain-structured sequence of states with a more expressive graphical structure. In the classical (possibly nonlinear) state-space model, latent states evolve according to: $x_{k+1} = f(x_k, u_k) + q_k, \qquad y_{k+1} = h(x_{k+1}) + r_k$ where $x_k$ is the latent state, $u_k$ is the input, $q_k$ , $r_k$ are process and measurement noise, and the structure is a simple chain over time.

In GSSMs, the chain topology is replaced or complemented by a functional/relational graph that captures interactions between state components or signals. This is achieved in different ways:

Linear models: The transition matrix $A$ in a linear Gaussian SSM, $x_{k+1} = A x_k + q_k$ , is interpreted as the adjacency matrix of a weighted directed graph, encoding causal (Granger) relationships among state components (Elvira et al., 2022). Sparsity or prior structure is enforced via regularization.
Nonlinear models: The transition function may be represented as a multivariate polynomial whose sparsity pattern defines a graph over the states (Cox et al., 23 Nov 2024).
Probabilistic graphical models: Factor graphs are constructed where each node represents a (possibly distributed/non-temporal) variable and edges encode dependencies, enabling inference in multi-connected topologies beyond simple chains (Lü, 2021).
Neural architectures: State space blocks are used to aggregate or propagate representations over tokens, sequences, or node neighborhoods, with the graph topology parameterizing the recurrence or structured convolution (Behrouz et al., 13 Feb 2024, Huang et al., 9 Jun 2024).

The graph—whether data-inferred, domain-defined, or dynamically constructed—serves as the core inductive bias, allowing the state-space formalism to capture non-local, spatial, or cross-modal dependencies and enabling new algorithmic strategies for inference, learning, and dynamical modeling.

2. Key Methodologies and Algorithmic Strategies

Graph state space models have spawned multiple algorithmic families, unified by their treatment of the graph structure in the dynamics or inference:

Factor graph optimization (Lü, 2021): Rather than solving a purely sequential filtering problem (as in the Extended Kalman Filter), the system is discretized as a graphical model—separating fast-changing (dynamic) from constant or distributed components, and representing these as distinct nodes/factors. Optimization proceeds via message-passing or junction tree algorithms for joint inference and smoothing.
EM/Majorization-Minimization with Sparsity (Elvira et al., 2022, Chouzenoux et al., 2023, Cox et al., 23 Nov 2024): For parameter estimation (e.g., learning the transition graph), expectation-maximization (EM) or majorization-minimization (MM) frameworks are combined with modern convex optimization or iteratively reweighted schemes to promote structures such as sparsity or stability, enabling structure discovery and interpretability.
Neural GSSMs via SSM Kernels and Graph Convolutions (Behrouz et al., 13 Feb 2024, Huang et al., 9 Jun 2024, Zhao et al., 16 Aug 2024, Zubić et al., 17 Dec 2024, Lahoti et al., 14 Oct 2025): Here, state space recurrences (linear or selective, as in Mamba) are adapted to operate over graph neighborhoods (via node or subgraph tokenization, graph kernel factorization, or MST-based aggregation). These approaches overcome locality bottlenecks and decouple the necessity for sequence-like ordering.
Probabilistic and Bayesian GSSMs (Zambon et al., 2023, Lippert et al., 2023, Tenorio et al., 12 Sep 2024): Probabilistic variants use explicit graphical models, variational inference, or belief propagation to maintain uncertainty quantification and conduct Bayesian estimation, supporting finer-grained beliefs about both latent states and graph parameters.
Dynamic and Temporal Graph Modeling (Li et al., 3 Jun 2024, Yuan et al., 11 Dec 2024): For evolving graphs, the SSM formalism is adapted to sequences of graph snapshots, incorporating structural regularization terms (e.g., Laplacian smoothing) or learning cross-snapshot dependencies with SSMs that can handle edge insertions, deletions, and dynamic membership.

These methodologies enable the design of state-space models that are not only capable of processing relational data but also scalable, interpretable, and more expressive than traditional chain-based approaches.

3. Expressivity, Inductive Biases, and Theoretical Aspects

A central advantage of graph state space models is the principled encoding of relational inductive bias. By parameterizing the recurrence or transition function with the adjacency, Laplacian, or dynamically learned connectivity of a graph, GSSMs achieve:

Permutation equivariance: via global or local aggregation schemes and kernel factorizations that do not depend on node ordering (Huang et al., 9 Jun 2024).
Selective and adaptive aggregation: with mechanisms for adaptive graph structure learning (e.g., via Sparsemax, adaptive node embedding) (Gazzar et al., 2022).
Universality and WL test surpassing: Certain GSSMs—especially those combining state space and appropriately rich positional encodings—provably go beyond the expressivity of message-passing neural networks (MPNNs) and even the 1-WL test, distinguishing graph isomorphisms in cases where traditional methods fail (Behrouz et al., 13 Feb 2024, Huang et al., 9 Jun 2024).
Efficient long-range propagation: The combination of kernelized convolutions or global SSM recurrences allows information to propagate over arbitrarily long graph distances without oversquashing (cf. "minimum local sensitivity" in (Ceni et al., 24 May 2025); explicit gradient lower bounds for walks in (Huang et al., 9 Jun 2024)).

These properties address two major limitations of earlier GNNs and sequence models: the bias towards local, short-range interactions, and the imposed need for artificial token ordering or position embeddings.

4. Practical Applications and Empirical Results

Graph state space models have demonstrated impact in a wide range of scientific and engineering domains:

Signal processing and causal inference: Sparse and interpretable SSMs enable Granger-causal structure discovery and forecasting in multivariate time series, remote sensing, and climate science (Elvira et al., 2022, Elvira et al., 2023, Cox et al., 23 Nov 2024).
High-dimensional spatiotemporal systems: Applications include environmental monitoring, sensor networks, and traffic dynamics, where the state vector's dimensions correspond to spatially distributed phenomena (Lippert et al., 2023, Li et al., 3 Jun 2024).
Computational neuroscience and clinical prediction: GSSM variants leveraging adaptive graph learning, self-supervised tasks, and spatio-temporal state-space modeling have shown state-of-the-art predictive accuracy in psychiatric disorder diagnosis from fMRI (Gazzar et al., 2022) and ICU length-of-stay prediction from clinical time series combined with patient similarity graphs (Zi et al., 24 Aug 2025).
Large-scale and multimodal data: Methods like Chimera (Lahoti et al., 14 Oct 2025) and GG-SSMs (Zubić et al., 17 Dec 2024) unify the treatment of graphs, images (seen as regular grids), sequences, and even cross-modal data (text/image/video), achieving strong performance on ImageNet, GLUE, the Long Range Graph Benchmark, and event-based vision datasets.
Temporal and dynamic graph learning: GSSM models specifically handle time-evolving graphs, with innovation in online approximation, Laplacian regularization, and efficient discretization for mixed-observable events (Li et al., 3 Jun 2024, Yuan et al., 11 Dec 2024).
Uncertainty quantification and structure learning: Full Bayesian treatments yield not just point predictions but uncertainty-aware beliefs about the network, applicable in settings with partial observability and abrupt topological changes (Tenorio et al., 12 Sep 2024).

Empirical results typically demonstrate improvements in accuracy, robustness (especially under adversarial perturbations), and interpretability over both classical message-passing and transformer-based architectures, often with reduced computational cost due to linear or near-linear scaling (Behrouz et al., 13 Feb 2024, Yuan et al., 11 Dec 2024, Lahoti et al., 14 Oct 2025).

5. Algorithmic Efficiency, Scalability, and Optimization

Several algorithmic developments underpin the scalability and efficiency of modern GSSMs:

Factorization and sparsity: Both inference and learning are accelerated by exploiting graph-structured sparsity in state transitions or noise covariances, yielding linear scaling with node count in conjugate gradient-based solvers for high-dimensional state estimation (Lippert et al., 2023, Li et al., 3 Jun 2024).
Structured mask and resolvent representations: Chimera's use of the Neumann series expansion and efficient computation strategies (e.g., linear-time recurrent algorithms for DAGs, squaring tricks for dense graphs) generalizes convolutional recurrences to arbitrary graphs while matching or improving transformer's complexity (Lahoti et al., 14 Oct 2025).
Parallelization and closed-form unrolling: Linear recurrence architectures permit explicit parallel calculation and well-behaved gradient flow, avoiding vanishing gradients associated with deep stacking of nonlinearities (Ceni et al., 24 May 2025).
Efficient graph construction and dynamic topology: MST-based dynamic graph formation (Zubić et al., 17 Dec 2024), adaptive adjacency learning (Gazzar et al., 2022, Zambon et al., 2023), and efficient cross-snapshot structure learning (Yuan et al., 11 Dec 2024) allow the architecture to adapt to domain structure without incurring prohibitive cost.

These innovations enable GSSMs to be deployed on large static and temporal graphs, high-resolution images/grids, and complex multivariate time series with both computational and statistical efficiency.

6. Extensions, Limitations, and Future Directions

Open directions and recognized limitations in the current landscape of graph state space models include:

Nonlinearity and generalization: Although many high-performing GSSMs are based on linear or polynomial models, accurate characterization of nonlinear, chaotic, or partially observed systems remains challenging. Extensions involving deep layers, expressive nonlinear parameterizations, or hybrid attention-state-space architectures are under exploration (Zambon et al., 2023, Cox et al., 23 Nov 2024).
Dynamic and unknown topology: Fully dynamic graph learning, particularly in settings with unknown or entirely data-derived structure, remains an active area, requiring efficient parameterization and avoidance of overfitting (Zambon et al., 2023, Yuan et al., 11 Dec 2024, Zubić et al., 17 Dec 2024).
Uncertainty quantification: While Bayesian inference with GSSMs is tractable in some cases, scalable and accurate uncertainty estimation for neural (deep) GSSMs is an ongoing challenge (Zambon et al., 2023, Tenorio et al., 12 Sep 2024).
Interpretability: Although model structures and transition matrices can be mapped to causal graphs, understanding learned representations and their relation to underlying domain mechanisms—especially in deep neural variants—is an open problem.
Hardware and large-scale deployment: Future work includes the development of custom hardware kernels for graph-structured SSMs, low-memory spectral decompositions, and further optimization of SSM computation for irregular and large-scale graphs (Lahoti et al., 14 Oct 2025).
Broader applications: Promising areas include cross-modal graph/state-space modeling for structured summarization (Kim et al., 26 Mar 2025), multi-modal clinical or social data (Zi et al., 24 Aug 2025), and directed graph learning with rigorous causal semantics (She et al., 17 Sep 2025).

A plausible implication is that as graph state space models mature, they may serve as a universal formalism bridging sequential, spatial, and relational modeling across domains, obviating the need for manual construction of ad hoc inductive biases.

7. Representative Models and Models at a Glance

Model / Paper	Core Idea	Efficiency/Strengths
GraphEM (Elvira et al., 2022, Elvira et al., 2023)	EM framework to infer transition matrix as a graph; supports sparsity and stability priors	Scalable with convex optimization; interpretable sparse graphs
GraphIT (Chouzenoux et al., 2023)	Majorization-minimization with non-convex sparse prior for LG-SSMs	Accurately recovers sparse structures, tractable
Graph-S4 (Gazzar et al., 2022)	SSM-based temporal modeling + adaptive graph learning for fMRI	Improves psychiatric disorder prediction
GraphSSM (Li et al., 3 Jun 2024)	SSMs for temporal/dynamic graphs, Laplacian regularization	Efficient for large-scale, evolving structures
MP-SSM (Ceni et al., 24 May 2025)	Linear SSM block in MPNN framework, theoretical sensitivity analysis	Efficient, permutation-equivariant, gradient-stable
Chimera (Lahoti et al., 14 Oct 2025)	Unified SSM via structured mask/Neumann series on graph topology	Generalizes SSMs to arbitrary DAGs/graphs
GG-SSMs (Zubić et al., 17 Dec 2024)	SSM state propagation over dynamically generated MST graphs	Strong for images/sequences/graphs, few parameters
DirGraphSSM (She et al., 17 Sep 2025)	k-hop ego-graph sequentialization for SSMs on directed graphs	Parallel scanning, preserves causal structure
GrassNet (Zhao et al., 16 Aug 2024)	SSM spectral filtering; sequence modeling over graph spectrum	Arbitrary spectral filters, robust on perturbed graphs

In summary, graph state space models provide a general, expressive, and efficient class of frameworks that combine dynamical systems theory, graphical modeling, and modern deep learning to process and reason about structured data where relationships are defined by graphs, going beyond traditional time series and sequence models.