Data-to-Energy Schrödinger Bridge Training

Updated 4 July 2026

Data-to-Energy Schrödinger Bridge Training is a stochastic method that transports empirical data to an energy-defined target by minimizing path-space relative entropy.
It employs iterative proportional fitting and matching-based solvers to derive Schrödinger potentials and control policies without relying on explicit target samples.
Empirical results demonstrate that non-memoryless couplings yield straighter trajectories and reduced transport costs, outperforming traditional score-based diffusion approaches.

Data-to-Energy Schrödinger Bridge Training is a class of methods for learning a stochastic process that transports a source distribution available through data samples to a target distribution specified only through an unnormalized energy, typically $p_{1}(x)\propto \exp(-E(x))$ , while minimizing a path-space relative entropy with respect to a reference diffusion. The central difficulty is the absence of target samples: the terminal law is known through $E(x)$ but not through an empirical dataset. Recent work addresses this by recasting the forward bridge as stochastic optimal control, deriving matching-based or iterative proportional fitting procedures that operate directly on the energy function, and exploiting non-memoryless couplings to obtain straighter and more efficient trajectories than ordinary diffusion training (Shin et al., 17 Feb 2026, Tamogashev et al., 30 Sep 2025).

1. Problem formulation and scope

In the canonical data-to-energy setting, one starts from a base SDE

$dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$

introduces a control $u_t(x)$ , and considers the controlled process

$dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$

with the endpoint constraint

$X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$

The associated Schrödinger bridge problem minimizes the relative entropy between the controlled path measure and the base path measure, equivalently minimizing

$\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$

subject to the endpoint marginals (Shin et al., 17 Feb 2026).

This formulation differs sharply from sample-to-sample Schrödinger bridge estimation. In the data-driven setting of Pavon–Tabak–Trigila, both endpoint marginals are represented by empirical measures, and training solves a sample-based Schrödinger system or its neural approximation (Pavon et al., 2018). The empirical-risk formulation of Belomestny–Naumov–Puchkin–Suchkov likewise assumes samples from $\rho_0$ and $\rho_T$ and estimates a transformed potential by minimizing an empirical fixed-point loss (Belomestny et al., 9 Feb 2026). Data-to-energy training removes one of these sample sets and replaces it with an energy oracle.

A broader formulation allows one or both marginals to be known only up to unnormalized densities. In that terminology, the present case is “data-to-energy,” where only $\mathcal E_1$ is available, while “energy-to-energy” denotes the case in which both marginals are specified by energies rather than samples (Tamogashev et al., 30 Sep 2025).

2. Variational structure and stochastic optimal control

A central observation is that the data-to-energy bridge can be written as a stochastic optimal control problem with terminal cost

$E(x)$ 0

The value function

$E(x)$ 1

satisfies the Hamilton–Jacobi–Bellman equation

$E(x)$ 2

and the optimal control is

$E(x)$ 3

If one defines $E(x)$ 4, then $E(x)$ 5 solves the forward Schrödinger integral equation, linking the control formulation directly to Schrödinger bridge potentials (Shin et al., 17 Feb 2026).

The same object may be described by forward and backward Schrödinger potentials $E(x)$ 6, with optimal controls

$E(x)$ 7

This forward/backward factorization is the basis of alternating half-bridge updates, bridge matching, and iterative proportional fitting constructions (Shin et al., 17 Feb 2026).

In the Boltzmann-sampling setting, where the terminal law is $E(x)$ 8, the path-space KL also admits a static form

$E(x)$ 9

so the terminal correction appears explicitly as a density-ratio term enforcing $dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 0 (Liu et al., 27 Jun 2025).

A related limiting statement appears in data-to-energy stochastic dynamics: when the bridge drift $dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 1 is compared to a reference drift $dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 2,

$dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 3

and as $dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 4 this recovers dynamic optimal transport with squared-Euclidean cost (Tamogashev et al., 30 Sep 2025).

3. Training objectives and algorithmic families

Recent methods differ primarily in how they avoid explicit target samples while still identifying the optimal bridge or a close approximation.

Framework	Core training decomposition	Distinctive feature
ASBM	Stage 1: Adjoint Matching + Corrector Matching; Stage 2: backward bridge matching	data-to-energy forward learning, then generative reverse dynamics
LightSB-M	single bridge matching from a reciprocal process	arbitrary transport plan input; equivalence to LightSB/EgNOT objective
Data-to-Energy Stochastic Dynamics	generalized IPF with backward ML and forward VarGrad	off-policy RL formulation; learned diffusion coefficient
ASBS	alternating Adjoint Matching and Corrector Matching	arbitrary source distributions; no target-sample estimation during training

In ASBM, the first stage learns the forward Schrödinger bridge control without ever constructing $dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 5. The Adjoint Matching loss regresses the forward control against a terminal energy-gradient target plus a proxy terminal corrector,

$dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 6

while Corrector Matching learns the terminal proxy by

$dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 7

After the forward model has converged and induced an approximate optimal coupling $dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 8, stage 2 learns the reverse-time drift by bridge matching,

$dX_t = f_t(X_t)\,dt + \sigma_t\,dW_t,\qquad X_0\sim p_{\mathrm{data}},$ 9

The paper emphasizes that the entire first stage uses only the forward simulation under $u_t(x)$ 0 (Shin et al., 17 Feb 2026).

LightSB-M, introduced by Korotin et al., starts from a reciprocal process $u_t(x)$ 1 whose endpoint coupling is an arbitrary $u_t(x)$ 2, and performs a single KL projection onto the manifold of Schrödinger bridges. The learned object is an adjusted Schrödinger potential $u_t(x)$ 3, and the energy-based objective

$u_t(x)$ 4

coincides, up to additive constants, with both $u_t(x)$ 5 and the optimal bridge-matching objective. The practical LightSB-M solver uses a Gaussian-mixture parameterization

$u_t(x)$ 6

which yields closed-form normalizers, drifts, and conditional samplers (Gushchin et al., 2024).

Data-to-Energy Stochastic Dynamics generalizes path-space IPF to the case where the final marginal is known only through an energy. A backward half-bridge is fitted by maximum likelihood on reverse trajectories, while the forward half-bridge replaces unavailable target-sample likelihoods with a conditional variance objective,

$u_t(x)$ 7

Training is further recast as a finite-horizon MDP with on-policy and off-policy trajectory mixtures, replay buffers, and Langevin refinement (Tamogashev et al., 30 Sep 2025).

ASBS is an allied method for learning to sample from Boltzmann distributions when the source is a simple prior rather than an empirical dataset. It alternates Adjoint Matching and Corrector Matching, avoids importance-weighted estimation and target-sample estimation during training, and proves convergence to the unique global solution under mild smoothness and expressivity assumptions. The framework generalizes recent Adjoint Sampling of Havens et al. by relaxing the memoryless condition to arbitrary source distributions (Liu et al., 27 Jun 2025).

Implementation practice is correspondingly heterogeneous. In the ASBS setting, reported practical choices include unit time horizon, a geometric noise schedule with $u_t(x)$ 8 and $u_t(x)$ 9, 50–100 Euler steps, replay buffers of $dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 0 sample pairs, $dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 1 clipping to norm $dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 2, and alternating Adjoint Matching and Corrector Matching for 5–20 stages with 50–200 gradient steps per stage (Liu et al., 27 Jun 2025).

4. Couplings, non-memorylessness, and trajectory geometry

A defining issue in data-to-energy Schrödinger bridge training is whether the reference construction is memoryless. ASBM argues that ordinary diffusion models inherit highly curved trajectories and noisy score targets from an uninformative memoryless forward process that induces independent data-noise coupling. In the memoryless case, $dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 3, the SB-optimal coupling becomes independent, and the backward drift reduces to the classical score SDE

$dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 4

which “forgets” $dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 5 entirely and injects maximum noise (Shin et al., 17 Feb 2026).

The non-memoryless alternative preserves endpoint dependence through the backward potential $dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 6. ASBM writes the optimal non-memoryless backward drift as

$dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 7

and reports that the process “remembers” its endpoint and travels in a near-straight line in state-time. Empirically, this reduces the transport cost

$dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 8

by a factor $dX_t = [f_t(X_t)+\sigma_t u_t(X_t)]\,dt + \sigma_t\,dW_t,$ 9 on CIFAR10, reduces the path-straightness ratio $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 0 by $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 1 relative to score SDE, and halves trajectory variance when measured using 10 samples per $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 2 (Shin et al., 17 Feb 2026).

LightSB-M supplies a complementary geometric statement. Its optimal projection theorem asserts that for any reciprocal process $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 3 with coupling $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 4,

$X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 5

so a single projection onto the Schrödinger-bridge manifold returns the true bridge. The associated tractable matching loss depends on the drift mismatch between the candidate SB drift and the Brownian-bridge target $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 6, and the paper shows that this MSE-type loss differs from the LightSB/EgNOT objective only by an additive constant (Gushchin et al., 2024).

These results are often read as correcting a common misconception: matching-based Schrödinger bridge training is not restricted to iterative procedures that must accumulate transport-plan error. Under the assumptions stated in LightSB-M, arbitrary transport plans can be used as inputs to a single optimal bridge-matching step (Gushchin et al., 2024).

5. Empirical behavior across domains

On image generation, ASBM reports substantial gains in low-NFE regimes. On CIFAR-10 with 100 NFE and a VP schedule, the reported FID values are Score SDE $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 7, SB-FBSDE $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 8, VSDM $X_1 \sim p_{\mathrm{prior}}(x)\propto \exp(-E(x)).$ 9, and ASBM $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 0. At 25 NFE, the comparison is Score SDE $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 1 versus ASBM $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 2. The FID-vs-NFE curve is reported to plateau at approximately $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 3 for ASBM by 1000 NFE, compared with approximately $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 4 for score SDE. On FFHQ-latent, the reported values are $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 5 at 50 NFE, $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 6 at 100 NFE, and $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 7 at 500 NFE. For one-step distillation on CIFAR-10, the paper reports SDS $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 8 with recall $\mathbb E_{p^u}\Bigl[\frac12\int_0^1 \|u_t(X_t)\|^2\,dt\Bigr]$ 9 and precision $\rho_0$ 0, DMD $\rho_0$ 1 with $\rho_0$ 2, and ASBM $\rho_0$ 3 with $\rho_0$ 4. The same study states that forward NFE $\rho_0$ 5 suffices and that even 10 forward steps still outperform score SDE with 100 steps (Shin et al., 17 Feb 2026).

On classical Schrödinger-bridge benchmarks, LightSB-M reports cB $\rho_0$ 6-UVP errors of approximately $\rho_0$ 7 at $\rho_0$ 8, compared with LightSB $\rho_0$ 9, DSBM $\rho_T$ 0, and SF $\rho_T$ 1M $\rho_T$ 2. At $\rho_T$ 3, the reported values are LightSB-M $\rho_T$ 4, LightSB $\rho_T$ 5, DSBM $\rho_T$ 6, and SF $\rho_T$ 7M $\rho_T$ 8. On single-cell trajectory inference, reported energy distances include $\rho_T$ 9 at dimension $\mathcal E_1$ 0 versus DSBM $\mathcal E_1$ 1, SF $\mathcal E_1$ 2M $\mathcal E_1$ 3, and LightSB $\mathcal E_1$ 4, and $\mathcal E_1$ 5 at dimension $\mathcal E_1$ 6 versus DSBM $\mathcal E_1$ 7, SF $\mathcal E_1$ 8M $\mathcal E_1$ 9, and LightSB $E(x)$ 00. The reported CPU training times are approximately $E(x)$ 01 seconds for LightSB-M versus approximately 6 minutes on GPU for DSBM (Gushchin et al., 2024).

Data-to-Energy Stochastic Dynamics evaluates the explicitly sample-free terminal setting on “Gauss $E(x)$ 02 GMM” and “Two Moons $E(x)$ 03 GMM” tasks in $E(x)$ 04, reporting $E(x)$ 05 and path-KL on par with data-to-data SB even though the GMM endpoint is accessed only through its energy. The same study reports that learning the diffusion coefficient improves several IPF-based baselines by $E(x)$ 06 in path-KL and Wasserstein cost when using $E(x)$ 07 or fewer discretization steps. In latent posterior sampling, the method produces semantic content-preserving image-to-image translations, and the reported FID between SB-transported samples and real-class images is often better than that obtained by simple rejection sampling of the latent space (Tamogashev et al., 30 Sep 2025).

In the allied source-to-energy regime, ASBS reports that it halves or better the previous best $E(x)$ 08-Wasserstein errors on MW-5, DW-4, LJ-13, and LJ-55; achieves the lowest KL on each of five alanine torsion marginals and the lowest $E(x)$ 09 error on the 2D Ramachandran plot; and attains 70–75% coverage without relaxation and approximately 90% with relaxation on amortized conformer generation, compared with approximately 57% for AS (Liu et al., 27 Jun 2025).

6. Relation to adjacent Schrödinger-bridge literature

The data-to-energy literature sits alongside, rather than replaces, sample-based Schrödinger bridge estimation. Pavon–Tabak–Trigila formulated a data-driven bridge from empirical marginals using Fortet–Sinkhorn-style iterations, constrained maximum likelihood estimation, and importance sampling, specifically to avoid grid discretization in high dimension (Pavon et al., 2018). Belomestny–Naumov–Puchkin–Suchkov later rewrote the Schrödinger system in terms of a single transformed potential satisfying a nonlinear fixed-point equation, learned by empirical risk minimization; under sub-Gaussian, Lipschitz, boundedness, and function-class assumptions, the paper establishes uniform concentration of empirical risk around population risk and near-parametric rates $E(x)$ 10 up to logarithmic factors when the bracketing entropy scales like $E(x)$ 11 (Belomestny et al., 9 Feb 2026).

Generalized Schrödinger Bridge Matching extends the bridge objective beyond kinetic energy to task-specific state costs and soft KL penalties. In its data-to-energy recipe, one sets $E(x)$ 12, introduces the objective

$E(x)$ 13

uses conditional stochastic optimal control with Gaussian path parameterizations, and in practice precomputes target samples from the unnormalized energy by short-run Langevin MCMC before explicit flow matching. This suggests a methodological division between terminal-energy training that remains sample-free on the target side and training that converts the energy into an auxiliary sample pool (Liu et al., 2023).

Several recurrent misconceptions are explicitly challenged in this line of work. One is that Schrödinger bridge training requires samples from both marginals; the data-to-energy and data-free IPF formulations were introduced precisely for the case in which the target marginal is available only through an unnormalized density (Tamogashev et al., 30 Sep 2025). A second is that bridge learning with an energy-defined endpoint is essentially the same as standard score-based diffusion; ASBM instead attributes curvature and noisy score targets to the memoryless forward process and replaces them with non-memoryless bridges (Shin et al., 17 Feb 2026). A third is that matching-based solvers must rely on importance weighting or repeated plan-refinement cycles; ASBS replaces these with simple matching objectives and on-policy samples, while LightSB-M proves exact recovery from a single optimal bridge-matching step under its assumptions (Liu et al., 27 Jun 2025, Gushchin et al., 2024).

Taken together, these works define data-to-energy Schrödinger bridge training as a technically specific regime: endpoint information is asymmetric, the terminal marginal is represented by an energy rather than a dataset, and training is organized around stochastic optimal control, Schrödinger potentials, bridge matching, or IPF-like alternation rather than direct score supervision from target samples. The resulting methods occupy a junction between diffusion modeling, entropic optimal transport, and energy-based learning, with the main design choices centered on endpoint access, coupling construction, memorylessness, and whether the target energy is used directly or first converted into samples.