Convolutive Non-negative Matrix Factorization (cNMF)

Updated 21 November 2025

cNMF is a decomposition technique that extends traditional NMF by integrating convolution, enabling analysis of sequential data.
It models temporal or spatial patterns effectively, making it suitable for applications like audio processing and time series analysis.
The method enforces nonnegativity and convolutional constraints, which lead to improved interpretability and precise feature localization.

Spatiotemporal causal graphical models are mathematical and algorithmic frameworks for representing, inferring, and exploiting the directed dependencies among variables indexed by both space and time. These models extend classical graphical causal models to accommodate systems in which interactions exhibit both temporal precedence and spatial structure, including but not limited to climate, neuroscience, epidemiology, biomedicine, and distributed cyber-physical systems. The core objective is to discover or model the directed acyclic graph (DAG) (or, in some cases, dynamic Bayesian network or SCM) that encodes cause-effect relationships among multivariate time series observed over gridded or networked spatial domains.

1. Conceptual Framework and Model Classes

Spatiotemporal causal graphical models instantiate a space–time–indexed set of random variables, typically denoted $X_i(s, t)$ for variable $i$ at spatial location $s$ and time $t$ . The essential building block is a set of nodes—corresponding to variables or subsystems—organized over a spatial domain $S$ and temporal index set $T$ . Directed edges $\left(X_j(s', t-\tau) \to X_i(s, t)\right)$ indicate that the past value of $X_j$ at location $s'$ and temporal lag $\tau$ exerts a direct causal effect on $X_i$ at $(s, t)$ (Supple et al., 30 Oct 2025).

Critical distinctions arise between three classes:

Observed-variable graphical models: Each node is an observable physical quantity, and the goal is to directly recover the graph $G$ from observed $X_i(s, t)$ (Supple et al., 30 Oct 2025, Jr. et al., 2010).
Latent-variable or factor models: Observed spatiotemporal fields are explained via a set of latent processes $Z_k(t)$ with spatial “footprints,” reducing effective dimensionality and aggregating correlated spatial nodes (Wang et al., 8 Nov 2024).
Mechanistic/structural models: Causal relations are encoded in stochastic difference equations, e.g., state-transition rules in SIR epidemics or patient-level latent Markov processes (Lee et al., 11 Jul 2025, Jr. et al., 2010).

The graphical model typically enforces acyclicity within each time slice and forward-time “causal order” for lagged links, sometimes distinguishing between contemporaneous (instantaneous) and lagged causal mechanisms (Supple et al., 30 Oct 2025, Wang et al., 8 Nov 2024).

2. Mathematical Formalisms

The underlying model is often formalized as an SCM or SEM over space–time–indexed variables:

$X_i(s, t) = f_i\left(\left\{X_j(s', t-\tau): (j, s', t-\tau) \in \text{pa}_G[i, s, t]\right\}, U(s), \varepsilon_i(s, t)\right)$

where $\text{pa}_G[i, s, t]$ denotes the parent set in $G$ , $U(s)$ represents latent spatial confounders (e.g., soil quality), and $\varepsilon_i(s, t)$ are i.i.d. noise innovations (Supple et al., 30 Oct 2025). In latent decomposition models, observations are generated as

$X(s, t) = g\left(\sum_{k=1}^K z_k(t) \cdot f_k(s)\right) + \varepsilon(s, t)$

where the $z_k(t)$ are latent time series, $f_k(s)$ are spatial factor functions (e.g., RBF kernels), and $g(\cdot)$ is an invertible nonlinearity (Wang et al., 8 Nov 2024).

For discrete-state models (e.g., SIR epidemics), the dependency structure is embedded in the transition probabilities of networked Markov chains (Jr. et al., 2010). In multivariate settings, plate notation may be used to clarify replicated structures over individuals, locations, time, or measurement channels (Lee et al., 11 Jul 2025).

3. Causal Discovery and Inference Methodologies

A principal challenge in spatiotemporal causal discovery is distinguishing direct cause–effect relationships from confounding due to spatial and temporal autocorrelation. Several classes of algorithms have been developed:

Conditional Independence (CI)-based algorithms:
- Extended PC and PCMCI+ algorithms conduct CI testing over time-lagged and spatially indexed variables, adjusting for spatial confounders by incorporating spatial coordinates or smoothers (e.g., GAMs) as controls in regression-based tests (Supple et al., 30 Oct 2025).
- Kernel-based CI tests (KCIT) for adjacency learning in spatiotemporal graphs rely on temporal embedding windows, coupled with statistical controls (e.g., SyPI filtering) to select direct causal parents (Mo et al., 25 Nov 2024).
- These methods typically enforce maximum lag $T_\text{lag}$ , local spatial neighborhoods, and acyclicity constraints.
Variational and latent-factor approaches:
- Variational autoencoding frameworks (e.g., SPACY) infer causal structure in a low-dimensional latent space, learning spatial factors as kernelized “modes” and a DAG over their time series. The evidence lower bound (ELBO) incorporates KL divergences for Bayesian regularization on latent dynamics, spatial kernels, and the graph itself (Wang et al., 8 Nov 2024).
Penalized likelihood methods:
- $\ell_1$ -regularized maximum likelihood convex programs recover the support of the spatiotemporal dependence matrix directly from discrete event sequences (e.g., SIR transitions), allowing statistically consistent topology selection under suitable sparsity-inducing penalties (Jr. et al., 2010).
Nonparametric and MDL-based regime detection:
- SpaceTime employs minimum description length (MDL) principles with nonparametric Gaussian process regression to jointly infer causal graph structure, regime (changepoint) locations in time, and context-specific partitions in space. Kernelized Hilbert-Schmidt Independence Criterion (HSIC) is used to segment contexts and regimes (Mameche et al., 17 Jan 2025).
Hybrid and feature-learning methods:
- Symbolic-dynamics-based pattern networks extract discrete-valued features capturing directed temporal dependencies, summarizing them as feature vectors for unsupervised learning (e.g., Restricted Boltzmann Machines) to encode joint nominal modes and anomaly structure (Liu et al., 2015).

4. Identifiability, Assumptions, and Theoretical Guarantees

Statistical identifiability of spatiotemporal causal graphs relies on several layers of assumptions:

Causal Markov and faithfulness on the joint process $X(s, t)$ (Supple et al., 30 Oct 2025).
Causal sufficiency up to spatial confounders, i.e., no unmeasured confounders outside smooth spatial latent fields $U(s)$ or latent regime/context variables (Supple et al., 30 Oct 2025, Mameche et al., 17 Jan 2025).
Invertibility and independence in latent factor models, enabling recovery of spatial kernels and time series up to permutation and invertible transform, given linear independence of spatial factors (Wang et al., 8 Nov 2024).
Exogeneity of spatial coordinates for spatial adjustment in regression-based CI testing (Supple et al., 30 Oct 2025).
Piecewise stationarity, or at least discrete nonstationarity via changepoints or context blocks, with mechanisms persisting within each regime/context (Mameche et al., 17 Jan 2025).

Under these, consistency results assert exact recovery of the true structured DAG (and, where appropriate, spatial and temporal segmentation) as sample size grows, provided kernel, basis, or functional approximations are adequate and minimal regime length is enforced (Wang et al., 8 Nov 2024, Mameche et al., 17 Jan 2025).

5. Computational Considerations and Scalability

Spatiotemporal causal graph learning is inherently high-dimensional, given the combinatorial growth in the number of nodes over locations, measurement channels, and temporal lags. Key algorithmic strategies include:

Exploiting time’s arrow to limit the directionality and reduce effective search space, with O( $N^2 k_\text{max})$ scaling for edge searches where $N$ is the number of nodes and $k_\text{max}$ the maximum conditioning set (Supple et al., 30 Oct 2025).
Vectorized and parallel computation for kernel evaluations (e.g., RBF-based spatial factors) in variational frameworks, permitting practical grid sizes up to $L=10^4$ (Wang et al., 8 Nov 2024).
Edge screening and pre-selection by simple correlation or spatial proximity priors to sparsify candidate parent sets and reduce the number of conditional independence tests (Mo et al., 25 Nov 2024).
Blockwise or coordinate-descent optimization in penalized likelihood frameworks, with closed-form soft-thresholding or blockwise Newton updates (Jr. et al., 2010).
Nonparametric regression accelerations such as hierarchical GAMs and thin-plate splines for faster CI regressions (Supple et al., 30 Oct 2025).
MDL-based edge addition/removal heuristics for efficient structure selection, re-using fitted conditional models across greedily proposed modifications (Mameche et al., 17 Jan 2025).

6. Empirical Applications and Benchmarks

Spatiotemporal causal graphical models are empirically validated in several domains:

Climate science: SPACY discovers latent spatial teleconnection modes (e.g., North Atlantic Oscillation) and infers lagged/instantaneous causal links consistent with literature on El Niño, NAO, and AAO, using global surface temperature fields (Wang et al., 8 Nov 2024). SpaceTime reveals regime and context structure in precipitation-runoff coupling and biosphere-atmosphere fluxes (Mameche et al., 17 Jan 2025).
Distributed CPS and anomaly detection: Symbolic-dynamics-based methods coupled to RBMs distinguish nominal from anomalous system-wide patterns in building-integrated systems and synthetic subsector networks (Liu et al., 2015).
Epidemiology: $\ell_1$ -penalized SIR network recovery achieves high sensitivity and specificity in reconstructing transmission networks from synthetic infection trajectories (Jr. et al., 2010).
Biomedical temporal and spatial modeling: Latent Markov and multi-graph approaches encode disease state trajectories and spatially correlated physiological markers, enabling robust doubly-robust treatment effect estimation in clinical data (Lee et al., 11 Jul 2025).
Urban mobility and graph-based forecasting: Causal Adjacency Learning reduces out-of-distribution RMSE by up to 50% relative to static adjacency schemes for graph convolutional predictions of human mobility under COVID-induced shifts (Mo et al., 25 Nov 2024).

A tabular summary of representative frameworks and their applications:

Framework/Method	Domain	Methodological Components
SPACY (Wang et al., 8 Nov 2024)	Climate, any gridded	Variational inference, spatial kernels, latent SCM, ELBO
SpaceTime (Mameche et al., 17 Jan 2025)	Hydroclimate, flux	MDL, nonparametric GPs, HSIC, regime/context segmentation
spatial-PCMCI+ (Supple et al., 30 Oct 2025)	Ecology, environment	GAM/CI regression, spatiotemporal PC algorithm
Causal Adjacency Learning (Mo et al., 25 Nov 2024)	Mobility	Kernel CI tests (KCIT), pre-selection, SyPI filtering
$\ell_1$ -SIR (Jr. et al., 2010)	Epidemics	Convex penalized likelihood, blockwise optimization
STPN+RBM (Liu et al., 2015)	CPS, signals	Symbolic dynamics, PFSA, RBM free energy detection

7. Limitations and Ongoing Challenges

Several limitations and ongoing research fronts are apparent:

Assumptions of causal sufficiency and stationarity may not hold in complex real-world systems with unobserved or evolving confounders (Mo et al., 25 Nov 2024, Mameche et al., 17 Jan 2025).
High memory and computation cost for very dense graphs or long lags, motivating further algorithmic improvements or richer prior exploitation (Mo et al., 25 Nov 2024, Wang et al., 8 Nov 2024).
Specification of model dimensionality (e.g., number of latent modes $K$ ) is often manual; adaptivity to optimize dimensionality or to account for multivariate fields remains an open question (Wang et al., 8 Nov 2024).
Transferability and robustness across contexts and urban systems require further empirical validation (Mo et al., 25 Nov 2024).
Handling missing data, interventions, or variable regime duration is not yet fully integrated into most frameworks (Wang et al., 8 Nov 2024, Mameche et al., 17 Jan 2025).

A plausible implication is that future advancements will need to address scalable automated model selection, richer spatial priors, multivariate spatial–temporal correlation structures, and flexible treatment of missing or irregularly sampled data.

Comprehensive references:

"Discovering Latent Causal Graphs from Spatiotemporal Data" (Wang et al., 8 Nov 2024)
"Discovering Causal Relationships Between Time Series With Spatial Structure" (Supple et al., 30 Oct 2025)
"SpaceTime: Causal Discovery from Non-Stationary Time Series" (Mameche et al., 17 Jan 2025)
"Causal Adjacency Learning for Spatiotemporal Prediction Over Graphs" (Mo et al., 25 Nov 2024)
"Predictive Causal Inference via Spatio-Temporal Modeling and Penalized Empirical Likelihood" (Lee et al., 11 Jul 2025)
"Spatio-Temporal Graphical Model Selection" (Jr. et al., 2010)
"An unsupervised spatiotemporal graphical modeling approach to anomaly detection in distributed CPS" (Liu et al., 2015)