State-Space Model (SSM) Principles

Updated 3 December 2025

State-space models are mathematical frameworks that describe dynamic systems through latent state decomposition and noisy observations.
They employ recursive algorithms like Kalman filtering and variational methods to separate system dynamics from measurement noise.
Recent developments integrate deep learning and hybrid architectures to enhance expressivity and scalability in complex real-world applications.

State-space models (SSMs) provide a mathematically rigorous framework for modeling time-evolving phenomena in which observed data arise as noisy, possibly partial, functions of the evolution of unobserved latent states. The SSM formalism unifies and generalizes diverse approaches to sequential data analysis, facilitating explicit separation between system dynamics and data measurement, and is foundational in statistics, machine learning, control theory, and many applied sciences. Recent developments encompass deep learning–based SSMs, continuous- and discrete-time variants, selective mechanisms, and hybrid architectures, demonstrating the adaptability and expressive power of this model class (Lin et al., 15 Dec 2024, Wang et al., 5 Aug 2025).

1. Mathematical Foundations and Core Structure

The canonical SSM describes a sequence of latent “state” variables $\{z_t\}$ or $\{x_t\}$ , with observed data $\{x_t\}$ or $\{y_t\}$ arising as functions of these states, often perturbed by noise. The general form in discrete time is

$\begin{aligned} z_{t+1} &= f(z_t, u_{t+1}; \theta) + w_{t+1} \ x_t &= g(z_t; \theta) + v_t \end{aligned}$

where $z_t\in\mathbb{R}^d$ denotes the state, $u_{t+1}$ is an (optional) exogenous input or “shock”, $f$ and $g$ are (potentially nonlinear) state transition and observation maps, and $w_{t+1}, v_t$ are process and observation noise, typically modeled as i.i.d. Gaussian $w_t \sim\mathcal{N}(0,Q)$ , $v_t\sim\mathcal{N}(0,R)$ in classical SSMs. In the linear-Gaussian specialization, $f(z) = A z$ , $g(z) = C z$ , yielding the Kalman filter model (Lin et al., 15 Dec 2024, Elvira et al., 2022).

Continuous-time SSMs are formulated as (stochastic) differential equations: $\begin{aligned} \text{Neural ODE:} && \frac{dz(t)}{dt} &= f_\theta(z(t), t) \ \text{Neural SDE:} && dz(t) &= f_\theta(z(t))dt + \sigma_\theta(z(t))dW(t) \end{aligned}$ with $f_\theta, \sigma_\theta$ neural network parameterizations and $W(t)$ a Wiener process (Lin et al., 15 Dec 2024).

2. State Decomposition, Markovian Dynamics, and Inference

A defining principle of SSMs is the latent-state decomposition: the observed process is interpreted as a noisy output from the evolution of a hidden Markov process. The Markov property manifests as

$p(z_{t+1} \mid z_{0:t}, u_{1:t+1}) = p(z_{t+1} \mid z_t, u_{t+1})$

and enables recursive, scalable inference. Filtering (causal), smoothing (acausal), and prediction are accomplished using Chapman–Kolmogorov recursions and Bayes’ rule, producing tractable algorithms (Kalman filter, Rauch–Tung–Striebel smoother) in the linear-Gaussian case, and approximate or Monte Carlo (e.g., particle filtering) methods otherwise (Lin et al., 15 Dec 2024, Elvira et al., 2022, Chen et al., 13 Sep 2024). Variational inference, notably the VAE framework, allows end-to-end learning in nonlinear, deep SSMs by optimizing the evidence lower bound (ELBO) via encoder–decoder architectures (Lin et al., 15 Dec 2024).

3. Learning Algorithms: Maximum Likelihood, EM, and Variational Approaches

Parameter learning in SSMs hinges on likelihood-based objectives. For linear-Gaussian SSMs, direct maximization of the marginal likelihood is tractable, and the expectation-maximization (EM) algorithm provides a closed-form iterative scheme: the E-step executes Kalman smoothing, and the M-step solves for $A, C, Q, R$ (Elvira et al., 2022, Lin et al., 15 Dec 2024). For general SSMs, marginalization over latent states necessitates approximate inference (Laplace approximation, particle filtering, variational inference). Deep SSMs utilize VAE-style variational bounds, with neural encoders and decoders parameterizing approximate posteriors and generative dynamics (Lin et al., 15 Dec 2024). Self-organizing SSMs embed static parameters into the state and introduce artificial dynamics to facilitate particle-based online inference and iterated filtering for maximum likelihood estimation, with controlled vanishing artificial variance to retain consistency (Chen et al., 13 Sep 2024).

4. Model Classes: Linear, Nonlinear, Deep, and Selective SSMs

Linear-Gaussian SSMs

Classical SSMs with $f,g$ linear and Gaussian noise are completely characterized by their mean and covariance dynamics. The Kalman filter establishes a computational paradigm, and the state-transition matrix can be interpreted as a Granger-causal graph (Elvira et al., 2022).

Nonlinear/Deep SSMs

When $f$ and $g$ are nonlinear (e.g., neural networks), nonlinear filtering and smoothing (EKF, UKF, particle methods) are required. Deep SSMs subsume architectures such as RNN-based deep Kalman filters and neural ODE/SDE models, with inference and learning performed via VAEs and stochastic gradient methods (Lin et al., 15 Dec 2024). State-space layers parameterized as structured convolutional or companion matrices enable AR( $p$ ) process representation and efficient long-horizon forecasting (Zhang et al., 2023).

Selective SSMs

Selective SSMs, including Mamba and the minimal predictive sufficiency SSM (MPS-SSM), incorporate learned gate mechanisms (input-dependent $B_k$ , $C_k$ , $\Delta_k$ ) governed by information-theoretic objectives. The MPS framework enforces that latent states form minimal sufficient statistics of the past for predicting the future, optimizing

$\mathcal{L}_{\mathrm{Total}} = \mathcal{L}_{\mathrm{Pred}} + \lambda \mathcal{L}_{\mathrm{Min}}$

where $\mathcal{L}_{\mathrm{Pred}}$ is prediction loss and $\mathcal{L}_{\mathrm{Min}}$ regularizes mutual information between inputs and hidden states, training the gating networks to discard non-predictive information (Wang et al., 5 Aug 2025).

5. Extensions: Expressivity, Basis Generalization, and Hybrid Architectures

The expressivity of SSMs is strongly influenced by state-transition parameterization:

Companion Matrix and AR( $p$ ) Processes: SpaceTime layers use companion-matrix forms for $A$ to guarantee full AR( $p$ ) expressivity, which is provably unattainable for diagonal or certain continuous-time SSMs (Zhang et al., 2023).
Frame-Agnostic (SaFARi) Representations: The SaFARi framework generalizes SSMs to arbitrary frames (not just orthogonal polynomials) for online approximation, permitting bases such as wavelets, Fourier, or Legendre, and yielding explicit matrix ODE recurrences for the expansion coefficients. HiPPO-style models are recovered as specific cases (Babaei et al., 13 May 2025, Gu et al., 2022).
Hybrid/Graph State-Space Models: SSMs have been extended to graph-temporal settings, integrating graph Laplacian regularization into the online approximation objective, producing coefficient ODEs with graph-diffusion terms and new layers (e.g., GraphSSM) (Li et al., 3 Jun 2024).

6. Application Domains and Empirical Results

SSMs underpin a wide range of applications:

Ecological and Biological Time Series: SSMs are foundational for population dynamics, animal movement, and capture–recapture studies. Their hierarchical structure distinguishes biological process noise from measurement error, essential for robust inference and uncertainty quantification. Both discrete and continuous-time instantiations are employed, with inference via Kalman, Laplace, particle, or MCMC approaches, and extensive model selection and diagnostic tools (Auger-Méthé et al., 2020).
Sequence Modeling and Forecasting: Deep SSMs and selective mechanisms have achieved state-of-the-art results in classification and forecasting across speech, sensor, medical, and long-range sequence tasks. SpaceTime demonstrates AR( $p$ ) expressivity and efficient FFT-based inference, outperforming Transformers and LSTMs in both accuracy and wall-clock time (Zhang et al., 2023).
Temporal Graph Modeling: GraphSSM leverages HiPPO-initialized SSM layers with graph-diffusion, achieving superior performance and linear complexity on benchmark temporal graph datasets (Li et al., 3 Jun 2024).

7. Trade-offs, Interpretability, and Future Directions

SSMs span a spectrum of trade-offs:

Expressivity vs. Tractability: Linear-Gaussian SSMs afford maximum interpretability and analytic inference, whereas deep/nonlinear SSMs offer superior expressivity at the cost of inference complexity.
Computational Efficiency: Classical SSMs run in $\mathcal{O}(d^3)$ per step; modern algorithms (companion, frame-agnostic, or diagonalizable $A$ ) enable $\mathcal{O}(d+\ell)$ , $\mathcal{O}(N\,L)$ , or parallel $\mathcal{O}(L)$ updates for long sequences, while convolutional views enable efficient batch training (Zhang et al., 2023, Babaei et al., 13 May 2025).
Interpretability: Graphical approaches and structured $A$ learning impose sparsity or stability constraints, connecting SSMs to Granger-causal structures and graph representations (Elvira et al., 2022).
Extensibility: Recent advances in minimal predictive sufficiency regularization, self-organizing particle approaches, and hybrid graph models indicate a continued trend toward greater generality, robustness, and cross-domain applicability (Wang et al., 5 Aug 2025, Chen et al., 13 Sep 2024, Babaei et al., 13 May 2025).

A plausible implication is that as sequence modeling tasks grow in scale and complexity, the Markovian and latent decomposition principles of SSMs—in both classical and deep forms—will remain critical for interpretable, robust, and efficient solutions.