Variational & Flow-Based SSM Approximations

Updated 23 February 2026

Variational and flow-based SSM approximations are techniques for approximating posteriors in nonlinear, non-Gaussian time-series models.
They integrate autoregressive structures and normalizing flows to enable scalable and efficient inference in both continuous and discrete dynamical systems.
Innovations like local IAF, particle marginal filtering, and Wasserstein gradient flows enhance computational performance and statistical accuracy.

Variational and flow-based state-space model (SSM) approximations constitute a class of techniques for approximate inference and learning in time-series models parameterized by (potentially nonlinear or non-Gaussian) state evolution and observation mechanisms. These approaches merge variational inference with autoregressive and normalizing flow architectures, particle-based estimators, and continuous-time flow-matching to provide tractable, expressive, and scalable posterior approximations for both continuous and discrete dynamical systems.

1. SSM Posterior Inference and Classical Variational Approaches

Let $X_{t_0:t_N}$ denote a latent Markov process with observed $Y_{t_0:t_N}$ and parameters $\theta$ . The joint model is given by: $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ with joint posterior: $p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ Classical variational inference introduces an approximating family $q(x_{t_0:t_N}, \theta; \phi)$ , optimized via the evidence lower bound (ELBO): $\mathcal{L}(\phi) = \mathbb{E}_{q(x,\theta;\phi)}\Bigg[ \log p(\theta) + \log p(x_{t_0}) + \sum_{i=1}^N \log p(x_{t_i} \mid x_{t_{i-1}},\theta) + \sum_{i=0}^N \log p(y_{t_i} \mid x_{t_i},\theta) - \log q(x_{t_0:t_N},\theta;\phi) \Bigg].$ This variational ELBO forms the backbone for modern flow-based and autoregressive SSM approximations (Ryder et al., 2018).

2. Flow-Based and Autoregressive Variational Families

Modern SSM methods leverage normalizing flows and autoregressive architectures to extend variational posteriors beyond simplistic mean-field or linear-Gaussian cases. For continuous latents and parameters, one constructs invertible mappings between noise variables and target latent variables, enabling tractable density evaluation and efficient reparameterization.

Inverse Autoregressive Flow (IAF) in SSMs:

The variational family factorizes as $q(x_{t_0:t_N}, \theta) = q(\theta)q(x_{t_1:t_N} \mid \theta)$ . For $\theta$ , $q(\theta)$ combines $Y_{t_0:t_N}$ 0 IAF modules with learned permutations. For latent trajectories, a local IAF is introduced: $Y_{t_0:t_N}$ 1 where $Y_{t_0:t_N}$ 2 denotes local convolutional receptive fields, and $Y_{t_0:t_N}$ 3 indexes flow layers. The final $Y_{t_0:t_N}$ 4 is a deterministic elementwise transform $Y_{t_0:t_N}$ 5. Each local-IAF layer is invertible and tractable, enabling $Y_{t_0:t_N}$ 6 computation per layer, where $Y_{t_0:t_N}$ 7 is receptive field width and $Y_{t_0:t_N}$ 8 is latent dimension—this is orders-of-magnitude faster than full-history IAF/MAF, which incur $Y_{t_0:t_N}$ 9 cost per layer (Ryder et al., 2018).

Extension to Discrete Latents:

Autoregressive variational posteriors for discrete SSMs (e.g., HMMs) are made GPU-efficient via fixed-point iterations. At each of $\theta$ 0 sweeps, all variables are updated in parallel using Gumbel-softmax reparameterization, enabling $\theta$ 1 parallel depth as opposed to $\theta$ 2 sequential steps: $\theta$ 3 This can be interpreted as a discrete normalizing flow on relaxed variables, with tractable ELBOs either by relaxing discrete samples or by leveraging the change-of-variables formula for the flow dynamics (Aitchison et al., 2018).

3. Particle Methods, Rao-Blackwellization, and Tractable Bounds

Particle filter-based variational methods provide an unbiased lower bound for intractable SSM posteriors. The Variational Marginal Particle Filter (VMPF) constructs a variational objective from a Rao–Blackwellized particle estimator of the marginal likelihood: $\theta$ 4 with marginal particle proposals and weights constructed as mixtures over previous-step particles. VMPF achieves a provable tighter lower bound than variational SMC (VSMC) due to variance reduction by Rao–Blackwellization: $\theta$ 5 Gradient estimators can be fully differentiable and even unbiased when all mixture samples admit a continuous reparameterization (Lai et al., 2021). Particle-filter-based variational SSMs achieve improved empirical test log-likelihoods over VSMC and IWAE, especially in complex and high-dimensional time-series (e.g., deep Markov models on music data).

Online Variational SMC: OVSMC distributes the optimization of the VSMC surrogate ELBO across time, enabling on-the-fly parameter inference and proposal adaptation with robust convergence guarantees (Mastrototaro et al., 2023).

4. Flow-Matching and Wasserstein Gradient Flows

Recent advances in SSM inference employ gradient flows on the Wasserstein space $\theta$ 6 to define variational filtering recursions. At each filtering step, the posterior $\theta$ 7 is approximated by minimizing

$\theta$ 8

Instead of parameterizing $\theta$ 9 in Euclidean coordinates, one defines a Wasserstein gradient flow: $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 0 where $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 1 and $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 2.

When $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 3 is restricted to be Gaussian or a mixture of Gaussians, the moment dynamics are ODEs: $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 4 This approach—Variational Wasserstein Filtering (VWF)—improves fidelity in non-Gaussian and multimodal settings where EKF fails, achieving accuracy competitive with particle filters but at reduced computational cost for moderate state dimensions (Corenflos et al., 2023).

5. Variational Flow-Matching for Structured and Hybrid SSMs

Generalizing flow-matching techniques, Pawsterior is a variational flow-matching framework specifically designed for simulation-based inference in SSMs with geometric or discrete/hybrid constraints. The method frames posterior transport as conditional dynamics between endpoints $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 5 via an affine interpolation $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 6, and parameterizes the flow ODE using two-sided variational networks estimating $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 7 and $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 8: $X_{t_0} \sim p(x_{t_0}), \quad X_{t_i} \mid X_{t_{i-1}}, \theta \sim p(x_{t_i} \mid x_{t_{i-1}}, \theta), \quad Y_{t_i} \mid X_{t_i}, \theta \sim p(y_{t_i} \mid x_{t_i}, \theta),$ 9 ELBO-style training is performed over endpoint pairs, with losses decomposed by coordinate (Gaussian for continuous, cross-entropy for discrete).

A principled advantage is affine geometric confinement: parameterizing the conditional means such that sampled flows remain within physically valid domains (e.g. boxes, simplexes), e.g., via $p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 0 or softmax networks. Pawsterior extends flow-matching to posteriors with strictly discrete components (e.g., switching systems), where earlier methods such as FMPE or standard continuous flows fail (Carrasco-Pollo et al., 14 Feb 2026).

Performance metrics such as Classifier Two-Sample Test (C2ST) confirm consistently improved posterior fit over previous flow-matching SBI tools, particularly on bounded and hybrid-latent SSMs. Pawsterior’s formulation further facilitates the solution of stiff ODEs in high-dimensional spaces via adaptive solvers and supports further extension to manifold-constrained latent domains.

6. Computational and Practical Considerations

The computational performance and tractability of variational and flow-based SSM approximations depend on model structure, posterior complexity, and targeted accuracy.

Local IAFs scale with $p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 1 per layer, enabling tractable inference on long series via convolutional architectures, and dramatically outperform $p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 2 cost full-history flows (Ryder et al., 2018).
Flow-matching and Wasserstein-gradient methods reduce computational cost versus high-particle SMC, but require ODE or PDE integration per update—still tractable for low-to-moderate dimensions (Corenflos et al., 2023, Carrasco-Pollo et al., 14 Feb 2026).
Particle variational bounds benefit from variance reduction by marginalization (Rao–Blackwellization), improving sample efficiency at the cost of $p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 3 per step, but offering superior tightness and differentiability (Lai et al., 2021).
Autoregressive discrete flows attain linear or sub-linear depth in sequence length via parallelized fixed-point sweeps, with only mild ELBO degradation compared to fully sequential autoregressive samplers—critical for large, discrete dynamical systems (Aitchison et al., 2018).

A summary table of complexity per method is provided:

Method	Complexity per Update	Domain Applicability
Local IAF SSM	$p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 4	Continuous latent SSM
Full-history IAF/MAF	$p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 5	Continuous (smaller $p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 6)
VMPF / Marginal PF	$p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 7	Continuous latent SSM
Parallel discrete FP	$p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 8, $p(x_{t_0:t_N},\theta \mid y_{t_0:t_N}) \propto p(\theta)p(x_{t_0})\prod_{i=1}^N p(x_{t_i} \mid x_{t_{i-1}},\theta)\prod_{i=0}^N p(y_{t_i} \mid x_{t_i},\theta).$ 9	Discrete SSM
Wasserstein Flow	ODE integration per step	Continuous/mixture SSM
Pawsterior VFM	ODE integration, $q(x_{t_0:t_N}, \theta; \phi)$ 0	Hybrid/discrete SSM

7. Empirical Results and Applicability

Empirical evaluations consistently demonstrate that local flow-based variational SSM approximations and particle-marginal approaches provide accurate posterior marginals, parameter estimation, and smoothing trajectories, matching or approximating ground truth or particle filter results with highly reduced runtime. For example:

Local IAF outperforms black-box VI and matches forward-filtered marginals in linear and nonlinear SSMs in minutes rather than hours (Ryder et al., 2018).
Wasserstein flow filtering tracks multimodal and multiplicative-noise posteriors with accuracy equivalent to high-particle SMC (Corenflos et al., 2023).
VMPF provides the tightest ELBO among tractable filtering objectives in deep Markov models and stochastic volatility SSMs (Lai et al., 2021).
Discrete flow-based SSM variational posteriors achieve 5–20× sampling speedups over serial autoregressive baselines while maintaining ELBOs within 1–2% of the optimal (Aitchison et al., 2018).
Pawsterior uniquely addresses strict bounding and hybrid-discrete SSM support, lowering C2ST versus existing flow-matching SBI baselines (Carrasco-Pollo et al., 14 Feb 2026).

These advances make state-of-the-art variational and flow-based SSM inference feasible for large-scale, nonlinear, and structured time-series data across scientific, engineering, and machine learning applications.

Markdown Report Issue Upgrade to Chat

References (6)

Black-Box Autoregressive Density Estimation for State-Space Models (2018)

Discrete flow posteriors for variational inference in discrete dynamical systems (2018)

Variational Marginal Particle Filters (2021)

Online Variational Sequential Monte Carlo (2023)

Variational Gaussian filtering via Wasserstein gradient flows (2023)

Pawsterior: Variational Flow Matching for Structured Simulation-Based Inference (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variational and Flow-Based SSM Approximations.

Variational & Flow-Based SSM Approximations

1. SSM Posterior Inference and Classical Variational Approaches

2. Flow-Based and Autoregressive Variational Families

3. Particle Methods, Rao-Blackwellization, and Tractable Bounds

4. Flow-Matching and Wasserstein Gradient Flows

5. Variational Flow-Matching for Structured and Hybrid SSMs

6. Computational and Practical Considerations

7. Empirical Results and Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Variational & Flow-Based SSM Approximations

1. SSM Posterior Inference and Classical Variational Approaches

2. Flow-Based and Autoregressive Variational Families

3. Particle Methods, Rao-Blackwellization, and Tractable Bounds

4. Flow-Matching and Wasserstein Gradient Flows

5. Variational Flow-Matching for Structured and Hybrid SSMs

6. Computational and Practical Considerations

7. Empirical Results and Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research