Denoising Diffusion Bridge Models

Updated 28 April 2026

Denoising Diffusion Bridge Models are generative models that construct stochastic processes pinned at both endpoints, enabling accurate data-to-data translation.
They extend score-based diffusion and optimal transport flows by incorporating advanced techniques like Doob’s h-transform and reverse-time SDEs.
Recent approaches, including consistency bridge models, dramatically reduce neural network evaluations while achieving superior sample quality in tasks such as image translation and medical counterfactuals.

Denoising Diffusion Bridge Models (DDBM) define a general class of generative models that construct stochastic processes interpolating between two fixed data endpoints, conditioned on both start and end states, rather than the standard paradigm of mapping from noise to data. They fundamentally generalize score-based diffusion models and optimal-transport (OT) flow paradigms, supporting high-fidelity data-to-data tasks such as image-to-image translation, conditional generation, Bayesian inference, medical counterfactuals, and structured design applications.

1. Mathematical Framework and Theoretical Formulation

DDBMs center around the construction of stochastic differential equations (SDEs) or equivalent probability-flow ordinary differential equations (PF-ODEs) that define a diffusion process pinned at both endpoints. Let $(x_0, x_T)\sim q_\text{data}(x_0, x_T)$ be paired samples. The forward SDE is given by: $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ For standard diffusion models, $x_T$ is typically drawn from an uninformative Gaussian prior. In DDBM, the forward process uses Doob’s $h$ -transform to enforce endpoint pinning: $dx_t = [f(x_t, t) + g^2(t) \nabla_{x} \log P_{T|t}(x_T = y \mid x_t)]\,dt + g(t)\,dw_t$ where $P_{T|t}(\cdot|x_t)$ is the transition kernel of the reference diffusion. Marginal distributions at intermediate time $t$ are Gaussian: $q_{t|0,T}(x_t|x_0, x_T) = \mathcal{N}(a_t x_T + b_t x_0, \gamma_t^2 I)$ The reverse-time SDE for bridge sampling is: $dx_t = \big[ f(x_t,t) - g^2(t) \big(\nabla_x \log q_{t|T}(x_t|x_T=y) - \nabla_x \log P_{T|t}(x_T=y|x_t) \big) \big]dt + g(t)\,d\bar{w}_t$ or, for deterministic sampling, the PF-ODE: $dx_t = f(x_t, t) - \tfrac{1}{2}g^2(t) \big[ \nabla_x \log q_{t|T}(x_t|x_T=y) - \nabla_x \log P_{T|t}(x_T=y|x_t) \big ] dt$ where the unknown "bridge score" $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 0 is approximated by a neural network $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 1 via denoising bridge score matching: $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 2 This construction allows the model to implement a score-based bridge between arbitrary endpoint distributions, making it substantially more general than unconditional diffusion frameworks (Zhou et al., 2023, He et al., 2024).

2. Consistency Models and Fast Bridge Sampling

Standard DDBM sampling requires numerically solving SDE/ODE trajectories with hundreds of neural network evaluations (NFE), resulting in high computational cost. Recent advances introduce consistency diffusion bridge models (CDBM), which learn a direct mapping—called the consistency function—between any point along the PF-ODE trajectory and the target solution at $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 3: $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 4 where $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 5 is the ODE flow map. Two main learning paradigms:

Consistency Bridge Distillation (CBD): Uses a reference bridge-score model to generate a target for distillation.
Consistency Bridge Training (CBT): Learns a single consistency net without relying on a pretrained score, using closed-form one-step ODE updates.

Both methods support minimal $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 6-step sampling or even 2-step sampling: 1) stochastic skip step to $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 7, 2) deterministic consistency mapping $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 8. This reduces NFE by 4–50× and yields better or comparable sample quality (e.g., Edges→Handbags: DDBM FID=1.83 @ 118 NFE vs. CBT FID=0.80 @ 2 NFE) (He et al., 2024). Downstream tasks such as semantic feature interpolation are also directly supported.

3. Relation to Schrödinger Bridges, OT Flows, and Conditional Simulation

DDBMs formalize, extend, and unify multiple generative paradigms:

Schrödinger bridges: DDBM is an explicit instance of a finite-horizon SB problem, minimizing KL divergence between path measures under given endpoint constraints. The time-reversal and SDE/PF-ODE structure are SB canonical forms (Heng et al., 2023, Shi et al., 2022).
Optimal transport (OT) flows and rectified flows: In the limiting case of vanishing bridge noise, the process becomes deterministic along a straight-line interpolation between $dx_t = f(x_t, t)\,dt + g(t)\,dw_t, \quad x_0 \sim p_0 \equiv p_\text{data}$ 9 and $x_T$ 0, precisely recovering the OT flow-matching ODE (Zhou et al., 2023).
Conditional simulation and Bayesian computation: DDBMs support sampling from posteriors and complex conditional distributions, outperforming non-bridge conditional diffusion schemes in accuracy and computational efficiency (Heng et al., 2023, Shi et al., 2022).

4. Algorithmic Implementation and Sampling Acceleration

Training

Compute bridge means and variances for paired samples.
Construct noisy latent $x_T$ 1 with known closed-form.
Regress network outputs to closed-form bridge score, using MSE or alternative losses.
Modern architectures: multi-resolution U-Nets with time and condition embeddings; FiLM or channel-concatenation conditioning on endpoints (Zhou et al., 2023, Aguila et al., 15 Oct 2025).

Sampling

Classic DDBM: Solve PF-ODE or hybrid SDE/ODE trajectories with high NFE.
Diffusion Bridge Implicit Models (DBIM): Generalize to non-Markovian bridges on discretized time grids that preserve marginals. By varying noise injection, bridge from stochastic DDPM-like updates to ODE/consistency or even single-step deterministic inversion. Introduce booting (initial noise) for diversity at the first step, and run subsequent deterministic updates—enabling exact autoencoding, semantic interpolation, and substantial speedup (DBIM@20 matches DDBM@100–200 for FID) (Zheng et al., 2024).
Consistency models: Map any trajectory point to the target in one or two evaluations, supporting ultra-fast and high-quality synthesis (He et al., 2024, Hou et al., 13 Apr 2026).

5. Design Choices, Preconditioning, and Stochasticity Control

Transition kernel design: Linear or schedule-based interpolation between endpoints (e.g., $x_T$ 2).
Score reparameterization: Precondition drift using predicted $x_T$ 3 to alleviate stiffness at trajectory endpoints, following EDM-style scaling (Zhang et al., 2024).
Sampling control: Interpolate between ODE (deterministic) and SDE (stochastic) by adjusting per-step variance or explicit boot noise. Adjust for desired tradeoff between fidelity and diversity, with guidelines on optimal $x_T$ 4, $x_T$ 5, NFE, and step schedules (Zhang et al., 2024).
Base distribution augmentation: Inject additional noise in the booting step for greater conditional diversity at minimal FID cost, quantifiable via metrics such as Average Feature Distance (AFD) (Zhang et al., 2024).
Algorithmic efficiency: Both DBIM and fast consistency bridge models require only $x_T$ 6– $x_T$ 7 network calls for high-fidelity outputs, dramatically reducing runtime and compute.

6. Applications, Empirical Results, and Downstream Tasks

DDBMs are state of the art for image-to-image translation, inpainting, superresolution, time-series forecasting, point cloud denoising, conditional simulation, speech enhancement, and medical counterfactual generation:

Translation/inpainting: Significantly improved FID/LPIPS/MSE over Pix2Pix, SDEdit, and traditional diffusion methods with fewer NFE (Zhou et al., 2023, Zhang et al., 2024, Zheng et al., 2024, He et al., 2024).
Medical imaging: DDBM-based counterfactuals for pathology removal exhibit superior anatomical preservation (Dice), Dice/AP/AUC in anomaly detection, and outperform both supervised and non-bridge diffusion methods (Aguila et al., 15 Oct 2025).
3D point clouds: Plug-and-play optimal-transport bridges (e.g., P2P-Bridge) yield leading pointwise and geometric metrics on synthetic and real data (Vogel et al., 2024).
Imitation learning and robotics: Informative prior selection enables substantial improvement in navigation tasks, with performance gains in success/collision rates and the ability to synthesize accurate action sequences in minimal steps (Ren et al., 14 Apr 2025).
Speech enhancement: Enhanced bridge models, combining predictive deep backbones with bridge sampling, reach best-in-class SI-SNR, PESQ, and ESTOI at a fraction of previous model parameters and cost (Wang et al., 20 Feb 2026).
Restoration: Energy-oriented bridges (E-Bridge) formulate geodesic trajectories with entropy-controlled injection, achieving state-of-the-art perceptual scores in denoising, SR, and restoration, as well as enabling single-step high-fidelity recovery (Hou et al., 13 Apr 2026).

7. Open Directions and Future Impact

DDBMs unify and generalize a broad set of conditional generative modeling methodologies, providing a highly flexible and theoretically principled framework for data-to-data generation. Current and future research directions focus on:

Further integrating consistency and fast-implicit sampling schemes (He et al., 2024, Zheng et al., 2024).
Extending bridge conditioning to multi-modal and hybrid inference (e.g., text/vision, multimodal medical synthesis).
Exploring optimal schedule, preconditioning, and stochasticity control for domain-specific tradeoffs in diversity, speed, and fidelity (Zhang et al., 2024).
Theoretical understanding of guidance/acceleration schemes, convergence guarantees, and expressivity for highly structured/discrete data.

DDBMs are rapidly becoming foundational in generative modeling for tasks beyond classic text/image synthesis, securing leading empirical results across modalities, supporting real-world, structure-conditioned, and semantically faithful inversion in both supervised and self-supervised algorithms (He et al., 2024, Zhou et al., 2023, Zheng et al., 2024, Hou et al., 13 Apr 2026).