Papers
Topics
Authors
Recent
Search
2000 character limit reached

Continuous-Time Evidence Lower Bound

Updated 24 January 2026
  • Continuous-Time ELBO is a variational lower bound for continuous-time SDE models that integrates data likelihoods with stochastic optimal control for irregular time series.
  • It employs Doob’s h-transform and neural control parameterizations to derive tractable approximations via a stochastic optimal control framework.
  • The approach enables efficient learning and simulation-free inference using piecewise linear drift approximations and modern network architectures.

The continuous-time Evidence Lower Bound (ELBO) is a variational lower bound formulated for probabilistic models that evolve according to continuous-time latent state-space dynamics, notably those driven by stochastic differential equations (SDEs). It provides a foundation for scalable inference and learning in irregularly observed time series, enabling the integration of data likelihoods and pathwise regularization. The continuous-time ELBO considered here arises from a stochastic optimal control (SOC) perspective, establishing a rigorous connection between Doob’s hh-transform, Feynman–Kac path measures, and amortized variational inference with neural control parameterizations (Park et al., 2024).

1. Feynman–Kac Path Measures and the Posterior in State-space SDEs

Let (Xt)t[0,T](X_t)_{t\in[0,T]} denote a Rd\mathbb{R}^d-valued diffusion evolving under an SDE

dXt=b(t,Xt)dt+dWt,X0μ0,dX_t = b(t,X_t)\,dt + dW_t,\quad X_0\sim\mu_0,

where bb is the drift and WtW_t standard Brownian motion. Observations {Yti}i=1k\{Y_{t_i}\}_{i=1}^k are made at irregular time-stamps 0=t0<t1<<tk=T0=t_0<t_1<\cdots<t_k=T, each with likelihood gi(ytiXti)g_i(y_{t_i}|X_{t_i}). The joint path-observation posterior, or the Feynman–Kac model, is

P(dX0:TYt1=yt1,,Ytk=ytk)=1Z(y1:k)i=1kgi(ytiXti)  P(dX0:T),\mathbb{P}^*\bigl(dX_{0:T}\mid Y_{t_1}=y_{t_1},\dots,Y_{t_k}=y_{t_k}\bigr) = \frac1{Z(y_{1:k})}\,\prod_{i=1}^k g_i\bigl(y_{t_i}\mid X_{t_i}\bigr)\;\mathbb{P}(dX_{0:T}),

with Z=EP[igi]Z=\mathbb{E}_{\mathbb{P}}\bigl[\prod_i g_i\bigr] the marginal likelihood. Normalized potentials are defined as fi(x)=gi(ytix)/Lif_i(x) = g_i(y_{t_i}|x) / L_i, Li=gi(ytix)P(dx)L_i = \int g_i(y_{t_i}|x)\mathbb{P}(dx), with the property EP[ifi]=1\mathbb{E}_{\mathbb{P}}[\prod_i f_i]=1.

A multi-marginal Doob's hh-transform yields the posterior dynamics as

dXt=[b(t,Xt)+xlogh(t,Xt)]dt+dWt,dX^*_t = \big[b(t,X^*_t) + \nabla_x \log h(t,X^*_t)\big]\,dt + dW_t,

where h(t,x)h(t,x) is a “backward survival” function propagating posterior information. This SDE generates exactly the posterior law P\mathbb{P}^* with the correct initial condition X0μ0(dx)h1(0,x)μ0(dx)X^*_0 \sim \mu^*_0(dx)\propto h_1(0,x)\,\mu_0(dx).

2. Variational Family, Amortization, and Auxiliary Variables

The intractability of xlogh\nabla_x \log h motivates a tractable variational family: controlled SDEs parameterized by neural controls,

dXtα=[b(t,Xtα)+α(t,Xtα)]dt+dWt,X0αμ0,dX^\alpha_t = [b(t,X^\alpha_t) + \alpha(t,X^\alpha_t)]\,dt + dW_t, \quad X^\alpha_0\sim\mu_0,

with induced path law Pα\mathbb{P}^\alpha. In amortized inference, per-observation latent variables ytiy_{t_i} are encoded via qϕ(ytioti)q_\phi(y_{t_i}|o_{t_i}) and decoded with pψ(otiyti)p_\psi(o_{t_i}|y_{t_i}). The control αθ(t,x)\alpha_\theta(t,x) is parameterized to depend on latent histories or the full latent collection {ytj}\{y_{t_j}\}, typically via a transformer or RNN.

3. Stochastic Optimal Control Formulation and Dynamic Programming

SOC theory provides a variational foundation for continuous-time inference. Define the cost functional

J(t,x;α)=EPαXtα=x[tT12α(s,Xsα)2dsi:titlogfi(Xtiα)].J(t,x;\alpha)= \mathbb{E}_{\mathbb{P}^\alpha|X^\alpha_t=x}\left[ \int_t^T \tfrac12\,\|\alpha(s,X^\alpha_s)\|^2\,ds - \sum_{i:\,t_i\ge t}\log f_i\bigl(X^\alpha_{t_i}\bigr) \right].

The value function V(t,x)=infαJ(t,x;α)V(t,x) = \inf_\alpha J(t,x;\alpha) satisfies, on subintervals [ti1,ti)[t_{i-1},t_i), the Hamilton–Jacobi–Bellman (HJB) PDE

tV+AtV+minα{xVα+12α2}=0;V(ti,x)=logfi(x)+V(ti+,x),\partial_t V + A_t V + \min_{\alpha}\{\nabla_x V^\top \alpha + \tfrac12\|\alpha\|^2\} = 0; \qquad V(t_i^-,x) = -\log f_i(x) + V(t_i^+,x),

where AtA_t is the infinitesimal generator of the prior SDE. The minimizer is α(t,x)=xV(t,x)\alpha^*(t,x)=\nabla_x V(t,x). The Hopf–Cole transform V=loghV = -\log h linearizes the HJB, relating hh to the solution of backward Kolmogorov equations and recovering the Doob control.

4. Variational Bound and Continuous-Time ELBO Construction

Using Girsanov’s theorem,

dPαdP(X0:T)=exp{0TαdW120Tα2ds},\frac{d\mathbb{P}^\alpha}{d\mathbb{P}}(X_{0:T}) = \exp\left\{\int_0^T \alpha^\top\,dW - \tfrac12\int_0^T\|\alpha\|^2\,ds\right\},

the KL divergence between the variational path law and the path posterior is

DKL(PαP)=EPα[0T12α2dsilogfi(Xti)],D_{KL}(\mathbb{P}^\alpha\|\mathbb{P}^*) = \mathbb{E}_{\mathbb{P}^\alpha}\left[ \int_0^T\tfrac12\|\alpha\|^2\,ds - \sum_i\log f_i(X_{t_i}) \right],

excluding a vanishing initial-law term at optimum. Setting

L(α):=EPα[0T12α2dsi=1kloggi(YtiXtiα)],\mathcal{L}(\alpha) := \mathbb{E}_{\mathbb{P}^\alpha}\left[ \int_0^T\tfrac12\|\alpha\|^2\,ds - \sum_{i=1}^k\log g_i(Y_{t_i}|X^\alpha_{t_i}) \right],

yields a tight variational characterization: DKL(PαP)=L(α)+logZ0D_{KL}(\mathbb{P}^\alpha\|\mathbb{P}^*) = \mathcal{L}(\alpha) + \log Z \geq 0, so that logZL(α)-\log Z \leq -\mathcal{L}(\alpha). Minimizing L(α)\mathcal{L}(\alpha) corresponds to optimal control and tightens the ELBO.

Combining the path-space bound with the standard VAE objective over latent variables, the negative ELBO is given by

ELBO(ψ,ϕ,θ)=Eqϕ[ilogpψ(otiyti)L(αθ)],\text{ELBO}(\psi,\phi,\theta) = \mathbb{E}_{q_\phi}\left[ \sum_i\log p_\psi(o_{t_i}\mid y_{t_i}) - \mathcal{L}(\alpha_\theta) \right],

where αθ\alpha_\theta depends on encoded latents y0:ky_{0:k}. The key object is

L(αθ)=EPαθ[0T12αθ(t,Xt)2dti=1kloggi(ytiXti)].\mathcal{L}(\alpha_\theta) = \mathbb{E}_{\mathbb{P}^{\alpha_\theta}}\left[ \int_0^T\tfrac12\|\alpha_\theta(t,X_t)\|^2\,dt - \sum_{i=1}^k\log g_i(y_{t_i}|X_{t_i}) \right].

This structure enables end-to-end training by maximizing ELBO(ψ,ϕ,θ)\text{ELBO}(\psi,\phi,\theta) across all variational and generative parameters.

5. Assumptions, Practical Strategies, and Simulation-free ELBO

All drifts b(t,x)b(t,x) and controls α(t,x)\alpha(t,x) are assumed Lipschitz with linear growth to guarantee strong solution existence for SDEs. The Hopf–Cole transform and use of Girsanov require Novikov-type moment conditions for validity. In practical implementation, the optimal drift xlogh\nabla_x\log h is replaced by a parameteric neural control αθ\alpha_\theta optimized via the ELBO.

Costly simulation of pathwise SDEs and backpropagation through continuous-time integrators can be circumvented by a piecewise locally linear drift ansatz: dXt=(AiXt+αi)dt+dWtdX_t = (-A_iX_t + \alpha_i)\,dt + dW_t for t[ti1,ti)t\in[t_{i-1},t_i), so that state marginals evolve as Gaussian processes with closed-form updates. This approximation enables efficient, simulation-free parallel ELBO computation. Amortized control construction is performed by modern attention-based networks (e.g., transformers) operating over {yti}\{y_{t_i}\}.

6. Summary and Practical Implementation

The continuous-time ELBO, as realized in the "Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series," synthesizes stochastic optimal control, Feynman–Kac representations, and deep amortized inference, resulting in an end-to-end objective: ELBO(ψ,ϕ,θ)=Eqϕ(y1:ko1:k)[i=1klogpψ(otiyti)EPαθ(0T12αθ2i=1kloggi(ytiXti))].\text{ELBO}(\psi,\phi,\theta) = \mathbb{E}_{q_\phi(y_{1:k}\mid o_{1:k})}\Biggl[ \sum_{i=1}^k\log p_\psi(o_{t_i}\mid y_{t_i}) - \mathbb{E}_{\mathbb{P}^{\alpha_\theta}}\left( \int_0^T\tfrac12\|\alpha_\theta\|^2 - \sum_{i=1}^k\log g_i(y_{t_i}\mid X_{t_i}) \right) \Biggr]. All nested expectations are tractable using 1. Sampling of ytiy_{t_i} from the encoder, 2. Neural construction of αθ\alpha_\theta via sequence models, 3. Either numerical SDE simulation or closed-form marginal propagations in the piecewise linear case, 4. Likelihood decoding via pψp_\psi. The formulation provides a theoretically grounded and computationally practical route to sequential data assimilation in continuous time, particularly for irregular time series (Park et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Continuous-Time Evidence Lower Bound (ELBO).