Spatiotemporal Causal Graphical Models

Updated 21 November 2025

Spatiotemporal causal graphical models are rigorous formalisms that represent and infer dynamic causal relationships across spatial locations and time lags.
Methodologies such as PCMCI-type algorithms, latent factor models, and penalized likelihood approaches are tailored to address high-dimensional data, spatial confounding, and nonstationarity.
Empirical evaluations in climatology, neuroscience, and epidemiology demonstrate improved interpretability, scalability, and forecasting accuracy when using these models.

Spatiotemporal causal graphical models provide a rigorous formalism for representing, inferring, and interpreting the structure of causality in systems exhibiting both spatial and temporal dependencies. These models generalize classical causal graphical models by explicitly encoding directed dependencies not only across variables and time lags, but also across spatially referenced locations—enabling the disentangling of direct, indirect, and confounded pathways in high-dimensional spatiotemporal data from domains such as climatology, neuroscience, epidemiology, and environmental monitoring.

1. Formal Definition and Key Assumptions

Spatiotemporal causal graphical models (ST-CGMs) consist of sets of random variables indexed by space and time, with directed edges representing putative direct causal influences. The typical object of interest is a discrete or continuous process $\{ X_i(s,t) \}$ , where $i$ indexes variables, $s$ denotes spatial location (from a set $S$ ; e.g., a grid or point locations in $\mathbb{R}^2$ ), and $t$ denotes time. Vertices of the directed graph correspond to random variables $X_i(s,t)$ ; directed edges $X_j(s',t-\tau) \to X_i(s,t)$ indicate that the past value of $X_j$ at location $s'$ and lag $\tau$ causally influences $X_i$ at $(s,t)$ (Supple et al., 30 Oct 2025).

Causal semantics are encoded using structural equation models (SEMs):

$X_i(s,t) = f_i\left(\{X_j(s', t-\tau) : (j, s', t-\tau) \in \mathrm{pa}_G[i,s,t]\}, U(s), \epsilon_i(s,t)\right)$

where $\mathrm{pa}_G[i,s,t]$ is the parent set in the DAG $G$ , $U(s)$ is a latent spatial confounder (assumed smooth), and $\epsilon_i(s,t)$ is stochastic innovation noise, independent across $i$ , $s$ , $t$ and from $U$ . The full DAG $G$ is assumed to satisfy the causal Markov and faithfulness conditions (Supple et al., 30 Oct 2025). Instantaneous edges ( $\tau=0$ ) typically connect variables within the same spatial neighborhood.

Underlying these models, additional assumptions vary by framework, but may include:

Causal sufficiency (all non-spatial confounders measured or negligible)
Smoothness or coarse spatial scale of spatial confounders
Exogeneity of spatial coordinates (sampling location independent of noise processes)
Model class assumptions (e.g., additive noise, invertibility of mappings, independence structure among latent factors) (Wang et al., 8 Nov 2024, Supple et al., 30 Oct 2025, Mameche et al., 17 Jan 2025).

2. Methodologies for Structure Learning

Multiple methodological paradigms exist for estimating spatiotemporal causal graphs from data, each addressing high dimensionality, spatial autocorrelation, temporal dependencies, and confounding.

2.1 Conditional Independence Testing & PCMCI-type Algorithms

Generalizations of the PC/PCMCI algorithms for time series extend to the spatiotemporal context by constructing a large DAG over all variables $X_i(s,t)$ and performing conditional independence (CI) tests to discover which temporal and spatially lagged edges are necessary (Supple et al., 30 Oct 2025). Each CI test is adjusted for spatial confounding—typically via regression-based methods that include spatial coordinates as predictors (e.g., hierarchical generalized additive models with spatial splines). The main algorithmic stages consist of:

Lagged-edge search: test for independence between $X_j(s',t-\tau)$ and $X_i(s,t)$ controlling for other variables and spatial coordinates.
Instantaneous-edge search: resolve contemporaneous interactions between spatially neighboring nodes.
Edge orientation: orient remaining undirected edges without introducing new colliders (Supple et al., 30 Oct 2025).

Identifiability of the resulting graph structure depends on the faithfulness of the CI tests and the correct specification of the regression models.

2.2 Latent Factor Models and Variational Inference

SPACY (SPAtiotemporal Causal discoverY) introduces a latent variable approach, positing that high-dimensional observational data $X(s,t)$ arise from a small number of temporally-varying latent time series $z_k(t)$ with corresponding spatial factor functions $f_k(s)$ (Wang et al., 8 Nov 2024). The observations are modeled via:

$X(s, t) = g \left( \sum_{k=1}^K z_k(t) f_k(s) \right) + \epsilon(s, t)$

where $g$ is an invertible nonlinearity (MLP parametrized), $f_k(s)$ are RBF-kernel spatial factors, and $\epsilon$ is Gaussian noise.

Causal relationships among the latent $z_k$ are modeled via a directed latent causal graph parameterized as an SCM, which can be linear or nonlinear (SPACY-L or SPACY-NL). Joint inference is performed with a variational ELBO objective:

The variational posterior is factorized over latent series, spatial kernels, and graph structure; acyclicity of the DAG is enforced using an augmented Lagrangian penalty.
Latent causal graphs are discovered directly by learning adjacency tensors for various lags (Wang et al., 8 Nov 2024).

SPACY provides theoretical identifiability guarantees under invertibility and independence assumptions, and achieves substantial computational scalability through vectorized kernel computations.

2.3 Penalized Likelihood and Model Selection

In discrete-state models such as SIR-type epidemic networks, the STGM topology can be estimated via $\ell_1$ -penalized maximum likelihood, embedding the specific process dynamics into the likelihood function and recovering the adjacency by sparsity-promoting penalties (Jr. et al., 2010). This enables tractable recovery of directed, time-lagged influence networks in settings with known Markovian dynamics.

2.4 Causal Adjacency Learning with Conditional Independence Filtering

Causal Adjacency Learning (CAL) uses conditional independence testing (e.g., with kernel-based CI tests and SyPI filtering) to estimate a binary graph adjacency $A^{\mathrm{causal}}$ that encodes invariant, testable causal influence between nodes (Mo et al., 25 Nov 2024). The result, which is definitionally invariant to distribution shift, can then be plugged into downstream GCN-based predictors for robust out-of-distribution performance.

2.5 Minimum Description Length (MDL) and Context/Regime Partitioning

SpaceTime models perform joint causal discovery and change point detection across time and space by searching for a partition of contexts and regimes in which mechanisms remain stationary, using an MDL score based on Gaussian process regression fits of conditional structural equations (Mameche et al., 17 Jan 2025). Nonparametric HSIC tests determine shifts in conditional distributions, and the overall graph and segmentations are optimized to minimize total model coding length.

3. Model Classes and Representational Strategies

ST-CGMs are instantiated in several broad classes depending on the underlying data, mechanisms, and computational objectives:

Explicit node-time DAGs: Each $X_i(s,t)$ forms a graph node; edges encode direct lagged or instantaneous influences (Supple et al., 30 Oct 2025, Jr. et al., 2010).
Latent factor models: A small set of latent time series and spatial basis functions maps to observed data; causality is inferred in the low-dimensional latent space before projecting to observations (Wang et al., 8 Nov 2024).
Hierarchical and mixed-graph models: In health applications, structured latent Markov models combine with multi-graph GCNs to jointly capture spatial and temporal dependencies among high-dimensional biomarker time series (Lee et al., 11 Jul 2025).
Symbolic dynamics and feature extraction: In CPS domains, discretized state sequences are mined for frequent relational motifs, which are then modeled via generative graphical models such as Restricted Boltzmann Machines for anomaly detection (Liu et al., 2015).

4. Empirical Performance and Real-World Applications

Comprehensive experimental evaluations demonstrate that:

Variational latent-causal frameworks (SPACY) consistently outperform state-of-the-art baselines (Varimax-PCA+PCMCI+, Linear-Response, LEAP, TDRL) in both graph recovery (orientation-aware F1-score) and latent series recovery (MCC), and are robust for increasing latent dimension $K$ (Wang et al., 8 Nov 2024).
In ecological and climate contexts, ST-CGMs recover interpretable, domain-consistent teleconnection patterns (e.g., North Atlantic Oscillation, El Niño/NAO/AAO in global surface temperature grids) and reconstruct dynamical causal mechanisms known from the scientific literature (Wang et al., 8 Nov 2024, Supple et al., 30 Oct 2025).
Causal adjacency discovery frameworks (CAL) yield not only improved forecasting performance—especially for out-of-distribution data (up to 50.3% RMSE reduction versus attention-based graphs)—but also sparser, more computationally efficient graph structures (Mo et al., 25 Nov 2024).
Minimum description length frameworks (SpaceTime) can uncover seasonal or year-specific regime changes in hydrology and atmospheric biosphere datasets, revealing both global and localized changes in spatiotemporal causal structures (Mameche et al., 17 Jan 2025).
Model-based structure learning in epidemic dynamics allows for accurate network recovery, outperforming standard spatial Markov random field approaches, and scaling to hundreds of spatial nodes (Jr. et al., 2010).

5. Identifiability, Scalability, and Limitations

Identifiability of graph structure in ST-CGMs is established under conditions of invertibility (for factor-based models), independence and faithfulness (for CI-based models), and correct model specification (Wang et al., 8 Nov 2024, Supple et al., 30 Oct 2025). Time-oriented structure and explicit modeling of directional temporal links reduce combinatorial complexity, making inference in polynomial rather than exponential time in the number of nodes (Supple et al., 30 Oct 2025). Scalability is further achieved via vectorized kernel computations (Wang et al., 8 Nov 2024), hierarchical modeling (Lee et al., 11 Jul 2025), and residual re-use in PCMCI-based methods (Supple et al., 30 Oct 2025).

Noted limitations include:

The need to pre-specify latent dimensionality $K$ and current restriction to single-variable fields in some frameworks (Wang et al., 8 Nov 2024).
Assumptions of causal sufficiency, stationarity in mechanisms, or absence of unmeasured confounding, which may be violated in practical scenarios (Mo et al., 25 Nov 2024, Supple et al., 30 Oct 2025).
Computational overheads of fully nonparametric or kernel-based CI tests, and possible challenges in transferability across domains with differing spatial or temporal dynamics (Mo et al., 25 Nov 2024).

Potential extensions involve automatic selection of $K$ , generalization to multivariate fields, handling interventions/missing data, and real-time or online adaptation for streaming data.

6. Comparative Analysis and Model Selection

ST-CGMs differ qualitatively from purely temporal causal discovery, Markov random fields, dynamic Bayesian networks, and "black-box" sequence models (e.g., RNNs, LSTMs) (Liu et al., 2015, Jr. et al., 2010). Key distinguishing features are the explicit treatment of spatiotemporal confounding, high-dimensional dependencies, and the ability to output interpretable causal graphs that are robust to spatiotemporal distribution shift. Classical methods such as DBNs or Ising models are NP-hard to learn and less effective in scenarios with ubiquitous spatial autocorrelation and nonstationary mechanisms.

Comparative strengths and weaknesses of representative approaches are outlined below:

Model/Approach	Strengths	Limitations
PCMCI-type methods	Explicit control for spatial confounders; scalable	Sensitive to CI-test specification
Variational latent models (SPACY)	Joint dimension reduction and causal discovery; provable identifiability; scalable	Pre-specification of K required; single-variable focus
Penalized likelihood (SIR)	Model-specific, consistent support recovery	Only for discrete-state, Markovian dynamics
CAL	Invariant graphs, OOD robustness; plug-in for GCNs	Assumes no hidden confounders; stationarity
MDL/SpaceTime	Data-driven changepoint and regime detection	Computational cost; GP model assumptions
STPN+RBM	Unsupervised, anomaly detection, explicit graphs	Linear partitioning, fixed lag, shallow RBM

7. Outlook and Open Problems

Research on spatiotemporal causal graphical models remains active, with open directions including adaptive model selection for latent dimensions and regime numbers, theoretical guarantees under more realistic forms of confounding or nonstationarity, scalable and distributed inference algorithms, and tighter integration with interventional data and experimental design. Current advances enable direct inference of interpretable, mechanistic influence diagrams from spatiotemporally autocorrelated data, providing actionable insights in domains ranging from environmental policy to clinical outcome prediction (Wang et al., 8 Nov 2024, Supple et al., 30 Oct 2025, Mameche et al., 17 Jan 2025, Mo et al., 25 Nov 2024, Jr. et al., 2010, Lee et al., 11 Jul 2025, Liu et al., 2015).