Papers
Topics
Authors
Recent
2000 character limit reached

DSBM-NeuralODE: Diffusion Bridge ODE

Updated 21 December 2025
  • The paper introduces a novel method that uses neural ODEs to approximate the optimal diffusion Schrödinger bridge, offering a scalable alternative to classical IPF solutions.
  • It replaces stochastic differential equations with a deterministic ODE surrogate, enabling the use of high-order adaptive solvers and significantly reducing function evaluations.
  • Empirical results on Gaussian transport and MNIST latent tasks show notable efficiency gains and competitive performance compared to traditional IPF-based and SINDy-FM methods.

DSBM-NeuralODE (Diffusion Schrödinger Bridge Matching with Neural ODEs) is a continuous-time generative modeling paradigm that parameterizes the Schrödinger bridge dynamics between two given probability measures via neural ordinary differential equations. Developed as a scalable, flexible, and efficient alternative to classical iterative proportional fitting and stochastic bridge solvers, DSBM-NeuralODE approximates the optimal bridge transport in high-dimensional latent spaces, with significant efficiency and adaptability advantages over baseline methods (Khilchuk et al., 14 Dec 2025).

1. Mathematical Foundations

The classical Schrödinger bridge (SB) seeks a stochastic process PSB\mathbb{P}^{\mathrm{SB}} on path space that solves: PSB=argminP:P0=π0,PT=πTKL(PQ)\mathbb{P}^{\mathrm{SB}} = \arg\min_{\mathbb{P} : \mathbb{P}_0 = \pi_0,\, \mathbb{P}_T = \pi_T} \mathrm{KL}(\mathbb{P} \| \mathbb{Q}) where π0,πT\pi_0, \pi_T are prescribed marginals on Rd\mathbb{R}^d, and Q\mathbb{Q} is a reference diffusion law: dXt=b(Xt,t)dt+σ(t)dWt,X0π0dX_t = b(X_t, t)\,dt + \sigma(t)\,dW_t\,, \qquad X_0\sim\pi_0 The optimal SB dynamics can be written as a stochastic differential equation (SDE)

dXt=[b(Xt,t)+v(Xt,t)]dt+σ(t)dWtdX_t = [b(X_t, t) + v^*(X_t, t)]\,dt + \sigma(t)\,dW_t

where vv^* encodes the correction drift determined by path-space conditional scores. Directly estimating vv^*, as in classical iterative proportional fitting (IPF), proves computationally infeasible for high-dimensional applications.

DSBM-NeuralODE replaces the SDE drift with a deterministic ODE surrogate: dXtdt=vθ(Xt,t)\frac{dX_t}{dt} = v_\theta(X_t, t) where vθv_\theta is a time- and state-dependent velocity field parameterized by a neural network (“ODEFunc”). At the optimum, this field mimics the mean drift of the optimal SB process in expectation along solution paths (Khilchuk et al., 14 Dec 2025).

2. Training Objectives and Loss Formulation

DSBM-NeuralODE proceeds in two main training phases:

(a) Pre-training on Reference Diffusions:

A forward diffusion, typically with DDPM-style schedule,

dXt=12β(t)Xtdt+β(t)dBt,X0π0dX_t = -\tfrac12\,\beta(t)\,X_t\,dt + \sqrt{\beta(t)}\,dB_t,\quad X_0\sim\pi_0

is simulated to generate datasets of (xt,xt+Δt)(x_t, x_{t+\Delta t}) pairs. The ODE surrogate fθforwardf_\theta^{\mathrm{forward}} is initially trained by minimizing

Lforward(θ)=E(xt,xt+Δt)xt+Δt[xt+fθforward(t,xt)Δt]2\mathcal{L}_{\mathrm{forward}}(\theta) = \mathbb{E}_{(x_t, x_{t+\Delta t})} \|x_{t+\Delta t} - [x_t + f_\theta^{\mathrm{forward}}(t, x_t)\Delta t]\|^2

An analogous backward model fϕbackwardf_\phi^{\mathrm{backward}} is trained in the reverse direction.

(b) Iterative Schrödinger Bridge Matching (SBM):

Given endpoint pairs (z0,z1)π0×πT(z_0, z_1)\sim \pi_0 \times \pi_T, intermediate bridge states are sampled using Brownian bridge interpolation. At iteration nn, for direction d{forward,backward}d\in\{\mathrm{forward}, \mathrm{backward}\}, the target velocity is constructed as

vtarget,d(t,x)=E[logQTt(XTx)]v^{\mathrm{target},d}(t, x) = \mathbb{E}[\nabla \log \mathbb{Q}_{T|t}(X_T|x)]

and the main loss is

Lnd(θ)=E(z0,z1)Π0,TnEtU[ϵ,1ϵ]vtarget,d(t,Xt)fθd(t,Xt)2\mathcal{L}_n^d(\theta) = \mathbb{E}_{(z_0, z_1)\sim \Pi^n_{0,T}} \mathbb{E}_{t\sim U[\epsilon,1-\epsilon]} \|v^{\mathrm{target},d}(t, X_t) - f_\theta^d(t, X_t)\|^2

Alternating minimization over forward and backward networks establishes the Iterative Markovian Fitting (IMF) process (Khilchuk et al., 14 Dec 2025, Shi et al., 2023).

3. Algorithmic Implementation

Below is the canonical DSBM-NeuralODE workflow:

  1. Pre-training
    • Simulate NN diffusion trajectories; collect consecutive pairs.
    • Fit fθforwardf_\theta^{\mathrm{forward}} and fϕbackwardf_\phi^{\mathrm{backward}} via their respective regression losses.
  2. Initialization
    • Set initial coupling Π0,T0\Pi^0_{0,T} by sampling X1=X0+σZX_1 = X_0 + \sigma Z, ZN(0,I)Z\sim\mathcal{N}(0, I).
  3. Iterative Matching (for n=0n = 0 to Niter1N_{\mathrm{iter}}-1)
    • Sample minibatches of endpoint pairs from Π0,Tn\Pi^n_{0,T}.
    • For each direction dd:
      • Sample (t,ε)(t, \varepsilon), generate interpolated state XtX_t.
      • Compute Lnd(θ)\mathcal{L}^d_n(\theta) and update the ODE network by gradient steps.
    • Update coupling Π0,Tn+1\Pi^{n+1}_{0,T} by propagating samples with the learned ODE or SDE.
  4. Inference (Sampling)

    • Sample xπ0x \sim \pi_0, integrate

    dXt=fθ^forward(t,Xt)dt+σdWtdX_t = f^{\mathrm{forward}}_{\hat\theta}(t, X_t)\,dt + \sigma\,dW_t

    from t=0t=0 to $1$ using an adaptive ODE solver or Euler–Maruyama (Khilchuk et al., 14 Dec 2025, Shi et al., 2023).

4. Architecture and Design Choices

The velocity field vθv_\theta is parameterized by a multilayer perceptron (MLP) with 2 hidden layers. For Gaussian transport tasks, widths are set to [64, 64] with ReLU activation; for MNIST latent translation, [128, 128] with Swish activations are used. The input consists of the state vector xx concatenated with a positional encoding of time tt. Regularization employs weight decay of 10410^{-4} (no dropout), with Adam optimizer and initial learning rate 1×1041\times10^{-4}. The parameter count for DSBM-NeuralODE per direction is approximately 2.7×1052.7\times 10^5 for both Gaussian and MNIST tasks (Khilchuk et al., 14 Dec 2025).

5. Efficiency, Interpretability, and Empirical Results

DSBM-NeuralODE leverages the deterministic ODE formulation to enable high-order adaptive solvers (e.g., Dormand-Prince), reducing the required number of function evaluations (NFEs) by 5–10×\times compared to fixed-step SDE samplers. On Gaussian transport, $1,000$ samples are generated in around $10$ seconds on CPU—yielding a 20×20\times speedup over IPF-based diffusion bridge methods (200\sim 200 s). The ODE surrogate’s smoothness in time facilitates more stable integration and visualization diagnostics relative to conventional SDE approaches.

The method remains less interpretable than symbolic SINDy-FM surrogates (which enable near-instantaneous inference and sparse models), but interpretability can be partially recovered through feature attribution and sensitivity analysis tools.

Empirical benchmarks demonstrate:

  • Gaussian transport (d=5d=5): DSBM-NeuralODE achieves W2=0.131W_2=0.131 with training/inference times of $2,326$ s/$21.8$ s and 2.7×1052.7\times10^5 parameters. Baseline DSBM (IPF) yields W2=0.103W_2=0.103 at $90$ s/$0.08$ s and 4.9×1034.9\times10^3 parameters.
  • MNIST latent translation ($8$-dim VAE): DSBM-NeuralODE attains FID = 72.2, Inception Score = 1.47, digit accuracy = 0.912, training = 450 s, inference = 0.08 s (Khilchuk et al., 14 Dec 2025).

In both cases, SINDy-FM achieves close performance with far fewer parameters and faster inference, but cannot match DSBM-NeuralODE for tasks requiring more expressive non-linear bridge dynamics.

Task Model W2W_2 / FID / IS Train s Infer s Params
Gaussian transport DSBM-NeuralODE W2=0.131W_2=0.131 2326 21.8 2.7×1052.7\times10^5
Gaussian transport DSBM (IPF) W2=0.103W_2=0.103 90 0.08 4.9×1034.9\times10^3
MNIST latent, 2\to3 DSBM-NeuralODE FID=72.2, IS=1.47 450 0.08 2.7×1052.7\times10^5
MNIST latent, 2\to3 SINDy-FM FID\approx83–89 -- <<0.001 541–923

6. Connections to Unified Bridge Paradigms and Theoretical Guarantees

DSBM-NeuralODE belongs to the broader class of unified bridge algorithms (UBA), which encompasses:

  • DSBM (Schrödinger Bridge Matching): SDE with nonzero reference noise σref>0\sigma_{\mathrm{ref}} > 0.
  • Flow Matching: ODE (zero noise limit, σ0\sigma\to 0) as in Benamou–Brenier optimal transport.

Both DSBM and flow matching minimize conditional MSE losses over “pinned” processes interpolating π0\pi_0 and πT\pi_T; the difference lies in the level of stochasticity and choice of process path law (Kim, 27 Mar 2025).

Theoretical results guarantee:

  • Each DSBM iteration decreases KL(PnPSB)\mathrm{KL}(P^n \,\|\, P^{\mathrm{SB}}); in the limit, convergence to the true bridge.
  • As σref0\sigma_{\mathrm{ref}}\to 0, SB solutions converge to the minimal-kinetic optimal transport solution (Benamou–Brenier flow), recovered by flow-matching objectives.
  • Universal approximation: Any time-state drift bt(x)b_t(x) is representable in a single iteration by the ODE surrogate, assuming sufficient model capacity and minimization accuracy (Kim, 27 Mar 2025, Khilchuk et al., 14 Dec 2025).

7. Limitations and Applicability Spectrum

DSBM-NeuralODE offers a balance between sample efficiency, expressiveness, and computational tractability. The ODE formulation enables advanced solvers and substantial speedups but is less interpretable and, due to overparameterization, can entail higher training costs. The method is best suited when high-fidelity reconstruction of non-linear bridge dynamics is essential. SINDy-FM remains preferable when interpretability and minimal parameterization are paramount, while classical IPF or SDE-based approaches may still be optimal for low-dimensional or limited-scale scenarios (Khilchuk et al., 14 Dec 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to DSBM-NeuralODE.