Continuous-Time Evidence Lower Bound

Updated 24 January 2026

Continuous-Time ELBO is a variational lower bound for continuous-time SDE models that integrates data likelihoods with stochastic optimal control for irregular time series.
It employs Doob’s h-transform and neural control parameterizations to derive tractable approximations via a stochastic optimal control framework.
The approach enables efficient learning and simulation-free inference using piecewise linear drift approximations and modern network architectures.

The continuous-time Evidence Lower Bound (ELBO) is a variational lower bound formulated for probabilistic models that evolve according to continuous-time latent state-space dynamics, notably those driven by stochastic differential equations (SDEs). It provides a foundation for scalable inference and learning in irregularly observed time series, enabling the integration of data likelihoods and pathwise regularization. The continuous-time ELBO considered here arises from a stochastic optimal control (SOC) perspective, establishing a rigorous connection between Doob’s $h$ -transform, Feynman–Kac path measures, and amortized variational inference with neural control parameterizations (Park et al., 2024).

1. Feynman–Kac Path Measures and the Posterior in State-space SDEs

Let $(X_t)_{t\in[0,T]}$ denote a $\mathbb{R}^d$ -valued diffusion evolving under an SDE

$dX_t = b(t,X_t)\,dt + dW_t,\quad X_0\sim\mu_0,$

where $b$ is the drift and $W_t$ standard Brownian motion. Observations $\{Y_{t_i}\}_{i=1}^k$ are made at irregular time-stamps $0=t_0<t_1<\cdots<t_k=T$ , each with likelihood $g_i(y_{t_i}|X_{t_i})$ . The joint path-observation posterior, or the Feynman–Kac model, is

$\mathbb{P}^*\bigl(dX_{0:T}\mid Y_{t_1}=y_{t_1},\dots,Y_{t_k}=y_{t_k}\bigr) = \frac1{Z(y_{1:k})}\,\prod_{i=1}^k g_i\bigl(y_{t_i}\mid X_{t_i}\bigr)\;\mathbb{P}(dX_{0:T}),$

with $Z=\mathbb{E}_{\mathbb{P}}\bigl[\prod_i g_i\bigr]$ the marginal likelihood. Normalized potentials are defined as $f_i(x) = g_i(y_{t_i}|x) / L_i$ , $L_i = \int g_i(y_{t_i}|x)\mathbb{P}(dx)$ , with the property $\mathbb{E}_{\mathbb{P}}[\prod_i f_i]=1$ .

A multi-marginal Doob's $h$ -transform yields the posterior dynamics as

$dX^*_t = \big[b(t,X^*_t) + \nabla_x \log h(t,X^*_t)\big]\,dt + dW_t,$

where $h(t,x)$ is a “backward survival” function propagating posterior information. This SDE generates exactly the posterior law $\mathbb{P}^*$ with the correct initial condition $X^*_0 \sim \mu^*_0(dx)\propto h_1(0,x)\,\mu_0(dx)$ .

2. Variational Family, Amortization, and Auxiliary Variables

The intractability of $\nabla_x \log h$ motivates a tractable variational family: controlled SDEs parameterized by neural controls,

$dX^\alpha_t = [b(t,X^\alpha_t) + \alpha(t,X^\alpha_t)]\,dt + dW_t, \quad X^\alpha_0\sim\mu_0,$

with induced path law $\mathbb{P}^\alpha$ . In amortized inference, per-observation latent variables $y_{t_i}$ are encoded via $q_\phi(y_{t_i}|o_{t_i})$ and decoded with $p_\psi(o_{t_i}|y_{t_i})$ . The control $\alpha_\theta(t,x)$ is parameterized to depend on latent histories or the full latent collection $\{y_{t_j}\}$ , typically via a transformer or RNN.

3. Stochastic Optimal Control Formulation and Dynamic Programming

SOC theory provides a variational foundation for continuous-time inference. Define the cost functional

$J(t,x;\alpha)= \mathbb{E}_{\mathbb{P}^\alpha|X^\alpha_t=x}\left[ \int_t^T \tfrac12\,\|\alpha(s,X^\alpha_s)\|^2\,ds - \sum_{i:\,t_i\ge t}\log f_i\bigl(X^\alpha_{t_i}\bigr) \right].$

The value function $V(t,x) = \inf_\alpha J(t,x;\alpha)$ satisfies, on subintervals $[t_{i-1},t_i)$ , the Hamilton–Jacobi–Bellman (HJB) PDE

$\partial_t V + A_t V + \min_{\alpha}\{\nabla_x V^\top \alpha + \tfrac12\|\alpha\|^2\} = 0; \qquad V(t_i^-,x) = -\log f_i(x) + V(t_i^+,x),$

where $A_t$ is the infinitesimal generator of the prior SDE. The minimizer is $\alpha^*(t,x)=\nabla_x V(t,x)$ . The Hopf–Cole transform $V = -\log h$ linearizes the HJB, relating $h$ to the solution of backward Kolmogorov equations and recovering the Doob control.

4. Variational Bound and Continuous-Time ELBO Construction

Using Girsanov’s theorem,

$\frac{d\mathbb{P}^\alpha}{d\mathbb{P}}(X_{0:T}) = \exp\left\{\int_0^T \alpha^\top\,dW - \tfrac12\int_0^T\|\alpha\|^2\,ds\right\},$

the KL divergence between the variational path law and the path posterior is

$D_{KL}(\mathbb{P}^\alpha\|\mathbb{P}^*) = \mathbb{E}_{\mathbb{P}^\alpha}\left[ \int_0^T\tfrac12\|\alpha\|^2\,ds - \sum_i\log f_i(X_{t_i}) \right],$

excluding a vanishing initial-law term at optimum. Setting

$\mathcal{L}(\alpha) := \mathbb{E}_{\mathbb{P}^\alpha}\left[ \int_0^T\tfrac12\|\alpha\|^2\,ds - \sum_{i=1}^k\log g_i(Y_{t_i}|X^\alpha_{t_i}) \right],$

yields a tight variational characterization: $D_{KL}(\mathbb{P}^\alpha\|\mathbb{P}^*) = \mathcal{L}(\alpha) + \log Z \geq 0$ , so that $-\log Z \leq -\mathcal{L}(\alpha)$ . Minimizing $\mathcal{L}(\alpha)$ corresponds to optimal control and tightens the ELBO.

Combining the path-space bound with the standard VAE objective over latent variables, the negative ELBO is given by

$\text{ELBO}(\psi,\phi,\theta) = \mathbb{E}_{q_\phi}\left[ \sum_i\log p_\psi(o_{t_i}\mid y_{t_i}) - \mathcal{L}(\alpha_\theta) \right],$

where $\alpha_\theta$ depends on encoded latents $y_{0:k}$ . The key object is

$\mathcal{L}(\alpha_\theta) = \mathbb{E}_{\mathbb{P}^{\alpha_\theta}}\left[ \int_0^T\tfrac12\|\alpha_\theta(t,X_t)\|^2\,dt - \sum_{i=1}^k\log g_i(y_{t_i}|X_{t_i}) \right].$

This structure enables end-to-end training by maximizing $\text{ELBO}(\psi,\phi,\theta)$ across all variational and generative parameters.

5. Assumptions, Practical Strategies, and Simulation-free ELBO

All drifts $b(t,x)$ and controls $\alpha(t,x)$ are assumed Lipschitz with linear growth to guarantee strong solution existence for SDEs. The Hopf–Cole transform and use of Girsanov require Novikov-type moment conditions for validity. In practical implementation, the optimal drift $\nabla_x\log h$ is replaced by a parameteric neural control $\alpha_\theta$ optimized via the ELBO.

Costly simulation of pathwise SDEs and backpropagation through continuous-time integrators can be circumvented by a piecewise locally linear drift ansatz: $dX_t = (-A_iX_t + \alpha_i)\,dt + dW_t$ for $t\in[t_{i-1},t_i)$ , so that state marginals evolve as Gaussian processes with closed-form updates. This approximation enables efficient, simulation-free parallel ELBO computation. Amortized control construction is performed by modern attention-based networks (e.g., transformers) operating over $\{y_{t_i}\}$ .

6. Summary and Practical Implementation

The continuous-time ELBO, as realized in the "Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series," synthesizes stochastic optimal control, Feynman–Kac representations, and deep amortized inference, resulting in an end-to-end objective: $\text{ELBO}(\psi,\phi,\theta) = \mathbb{E}_{q_\phi(y_{1:k}\mid o_{1:k})}\Biggl[ \sum_{i=1}^k\log p_\psi(o_{t_i}\mid y_{t_i}) - \mathbb{E}_{\mathbb{P}^{\alpha_\theta}}\left( \int_0^T\tfrac12\|\alpha_\theta\|^2 - \sum_{i=1}^k\log g_i(y_{t_i}\mid X_{t_i}) \right) \Biggr].$ All nested expectations are tractable using 1. Sampling of $y_{t_i}$ from the encoder, 2. Neural construction of $\alpha_\theta$ via sequence models, 3. Either numerical SDE simulation or closed-form marginal propagations in the piecewise linear case, 4. Likelihood decoding via $p_\psi$ . The formulation provides a theoretically grounded and computationally practical route to sequential data assimilation in continuous time, particularly for irregular time series (Park et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Continuous-Time Evidence Lower Bound (ELBO).