DSBM-NeuralODE: Diffusion Bridge ODE

Updated 21 December 2025

The paper introduces a novel method that uses neural ODEs to approximate the optimal diffusion Schrödinger bridge, offering a scalable alternative to classical IPF solutions.
It replaces stochastic differential equations with a deterministic ODE surrogate, enabling the use of high-order adaptive solvers and significantly reducing function evaluations.
Empirical results on Gaussian transport and MNIST latent tasks show notable efficiency gains and competitive performance compared to traditional IPF-based and SINDy-FM methods.

DSBM-NeuralODE (Diffusion Schrödinger Bridge Matching with Neural ODEs) is a continuous-time generative modeling paradigm that parameterizes the Schrödinger bridge dynamics between two given probability measures via neural ordinary differential equations. Developed as a scalable, flexible, and efficient alternative to classical iterative proportional fitting and stochastic bridge solvers, DSBM-NeuralODE approximates the optimal bridge transport in high-dimensional latent spaces, with significant efficiency and adaptability advantages over baseline methods (Khilchuk et al., 14 Dec 2025).

1. Mathematical Foundations

The classical Schrödinger bridge (SB) seeks a stochastic process $\mathbb{P}^{\mathrm{SB}}$ on path space that solves: $\mathbb{P}^{\mathrm{SB}} = \arg\min_{\mathbb{P} : \mathbb{P}_0 = \pi_0,\, \mathbb{P}_T = \pi_T} \mathrm{KL}(\mathbb{P} \| \mathbb{Q})$ where $\pi_0, \pi_T$ are prescribed marginals on $\mathbb{R}^d$ , and $\mathbb{Q}$ is a reference diffusion law: $dX_t = b(X_t, t)\,dt + \sigma(t)\,dW_t\,, \qquad X_0\sim\pi_0$ The optimal SB dynamics can be written as a stochastic differential equation (SDE)

$dX_t = [b(X_t, t) + v^*(X_t, t)]\,dt + \sigma(t)\,dW_t$

where $v^*$ encodes the correction drift determined by path-space conditional scores. Directly estimating $v^*$ , as in classical iterative proportional fitting (IPF), proves computationally infeasible for high-dimensional applications.

DSBM-NeuralODE replaces the SDE drift with a deterministic ODE surrogate: $\frac{dX_t}{dt} = v_\theta(X_t, t)$ where $v_\theta$ is a time- and state-dependent velocity field parameterized by a neural network (“ODEFunc”). At the optimum, this field mimics the mean drift of the optimal SB process in expectation along solution paths (Khilchuk et al., 14 Dec 2025).

2. Training Objectives and Loss Formulation

DSBM-NeuralODE proceeds in two main training phases:

(a) Pre-training on Reference Diffusions:

A forward diffusion, typically with DDPM-style schedule,

$dX_t = -\tfrac12\,\beta(t)\,X_t\,dt + \sqrt{\beta(t)}\,dB_t,\quad X_0\sim\pi_0$

is simulated to generate datasets of $(x_t, x_{t+\Delta t})$ pairs. The ODE surrogate $f_\theta^{\mathrm{forward}}$ is initially trained by minimizing

$\mathcal{L}_{\mathrm{forward}}(\theta) = \mathbb{E}_{(x_t, x_{t+\Delta t})} \|x_{t+\Delta t} - [x_t + f_\theta^{\mathrm{forward}}(t, x_t)\Delta t]\|^2$

An analogous backward model $f_\phi^{\mathrm{backward}}$ is trained in the reverse direction.

(b) Iterative Schrödinger Bridge Matching (SBM):

Given endpoint pairs $(z_0, z_1)\sim \pi_0 \times \pi_T$ , intermediate bridge states are sampled using Brownian bridge interpolation. At iteration $n$ , for direction $d\in\{\mathrm{forward}, \mathrm{backward}\}$ , the target velocity is constructed as

$v^{\mathrm{target},d}(t, x) = \mathbb{E}[\nabla \log \mathbb{Q}_{T|t}(X_T|x)]$

and the main loss is

$\mathcal{L}_n^d(\theta) = \mathbb{E}_{(z_0, z_1)\sim \Pi^n_{0,T}} \mathbb{E}_{t\sim U[\epsilon,1-\epsilon]} \|v^{\mathrm{target},d}(t, X_t) - f_\theta^d(t, X_t)\|^2$

Alternating minimization over forward and backward networks establishes the Iterative Markovian Fitting (IMF) process (Khilchuk et al., 14 Dec 2025, Shi et al., 2023).

3. Algorithmic Implementation

Below is the canonical DSBM-NeuralODE workflow:

Pre-training
- Simulate $N$ diffusion trajectories; collect consecutive pairs.
- Fit $f_\theta^{\mathrm{forward}}$ and $f_\phi^{\mathrm{backward}}$ via their respective regression losses.
Initialization
- Set initial coupling $\Pi^0_{0,T}$ by sampling $X_1 = X_0 + \sigma Z$ , $Z\sim\mathcal{N}(0, I)$ .
Iterative Matching (for $n = 0$ $n = 0$ to $N_{\mathrm{iter}}-1$ $N_{iter} - 1$ )
- Sample minibatches of endpoint pairs from $\Pi^n_{0,T}$ .
- For each direction $d$ $d$ :
  - Sample $(t, \varepsilon)$ , generate interpolated state $X_t$ .
  - Compute $\mathcal{L}^d_n(\theta)$ and update the ODE network by gradient steps.
- Update coupling $\Pi^{n+1}_{0,T}$ by propagating samples with the learned ODE or SDE.
Inference (Sampling)
- Sample $x \sim \pi_0$ , integrate
$dX_t = f^{\mathrm{forward}}_{\hat\theta}(t, X_t)\,dt + \sigma\,dW_t$

from $t=0$ to $1$ using an adaptive ODE solver or Euler–Maruyama (Khilchuk et al., 14 Dec 2025, Shi et al., 2023).

4. Architecture and Design Choices

The velocity field $v_\theta$ is parameterized by a multilayer perceptron (MLP) with 2 hidden layers. For Gaussian transport tasks, widths are set to [64, 64] with ReLU activation; for MNIST latent translation, [128, 128] with Swish activations are used. The input consists of the state vector $x$ concatenated with a positional encoding of time $t$ . Regularization employs weight decay of $10^{-4}$ (no dropout), with Adam optimizer and initial learning rate $1\times10^{-4}$ . The parameter count for DSBM-NeuralODE per direction is approximately $2.7\times 10^5$ for both Gaussian and MNIST tasks (Khilchuk et al., 14 Dec 2025).

5. Efficiency, Interpretability, and Empirical Results

DSBM-NeuralODE leverages the deterministic ODE formulation to enable high-order adaptive solvers (e.g., Dormand-Prince), reducing the required number of function evaluations (NFEs) by 5–10 $\times$ compared to fixed-step SDE samplers. On Gaussian transport, $1,000$ samples are generated in around $10$ seconds on CPU—yielding a $20\times$ speedup over IPF-based diffusion bridge methods ( $\sim 200$ s). The ODE surrogate’s smoothness in time facilitates more stable integration and visualization diagnostics relative to conventional SDE approaches.

The method remains less interpretable than symbolic SINDy-FM surrogates (which enable near-instantaneous inference and sparse models), but interpretability can be partially recovered through feature attribution and sensitivity analysis tools.

Empirical benchmarks demonstrate:

Gaussian transport ( $d=5$ ): DSBM-NeuralODE achieves $W_2=0.131$ with training/inference times of $2,326$ s/$21.8$ s and $2.7\times10^5$ parameters. Baseline DSBM (IPF) yields $W_2=0.103$ at $90$ s/$0.08$ s and $4.9\times10^3$ parameters.
MNIST latent translation ($8$-dim VAE): DSBM-NeuralODE attains FID = 72.2, Inception Score = 1.47, digit accuracy = 0.912, training = 450 s, inference = 0.08 s (Khilchuk et al., 14 Dec 2025).

In both cases, SINDy-FM achieves close performance with far fewer parameters and faster inference, but cannot match DSBM-NeuralODE for tasks requiring more expressive non-linear bridge dynamics.

Task	Model	$W_2$ / FID / IS	Train s	Infer s	Params
Gaussian transport	DSBM-NeuralODE	$W_2=0.131$	2326	21.8	$2.7\times10^5$
Gaussian transport	DSBM (IPF)	$W_2=0.103$	90	0.08	$4.9\times10^3$
MNIST latent, 2 $\to$ 3	DSBM-NeuralODE	FID=72.2, IS=1.47	450	0.08	$2.7\times10^5$
MNIST latent, 2 $\to$ 3	SINDy-FM	FID $\approx$ 83–89	--	$<$ 0.001	541–923

6. Connections to Unified Bridge Paradigms and Theoretical Guarantees

DSBM-NeuralODE belongs to the broader class of unified bridge algorithms (UBA), which encompasses:

DSBM (Schrödinger Bridge Matching): SDE with nonzero reference noise $\sigma_{\mathrm{ref}} > 0$ .
Flow Matching: ODE (zero noise limit, $\sigma\to 0$ ) as in Benamou–Brenier optimal transport.

Both DSBM and flow matching minimize conditional MSE losses over “pinned” processes interpolating $\pi_0$ and $\pi_T$ ; the difference lies in the level of stochasticity and choice of process path law (Kim, 27 Mar 2025).

Theoretical results guarantee:

Each DSBM iteration decreases $\mathrm{KL}(P^n \,\|\, P^{\mathrm{SB}})$ ; in the limit, convergence to the true bridge.
As $\sigma_{\mathrm{ref}}\to 0$ , SB solutions converge to the minimal-kinetic optimal transport solution (Benamou–Brenier flow), recovered by flow-matching objectives.
Universal approximation: Any time-state drift $b_t(x)$ is representable in a single iteration by the ODE surrogate, assuming sufficient model capacity and minimization accuracy (Kim, 27 Mar 2025, Khilchuk et al., 14 Dec 2025).

7. Limitations and Applicability Spectrum

DSBM-NeuralODE offers a balance between sample efficiency, expressiveness, and computational tractability. The ODE formulation enables advanced solvers and substantial speedups but is less interpretable and, due to overparameterization, can entail higher training costs. The method is best suited when high-fidelity reconstruction of non-linear bridge dynamics is essential. SINDy-FM remains preferable when interpretability and minimal parameterization are paramount, while classical IPF or SDE-based approaches may still be optimal for low-dimensional or limited-scale scenarios (Khilchuk et al., 14 Dec 2025).