Image-to-Image Schrödinger Bridge (I²SB)

Updated 2 February 2026

Image-to-Image Schrödinger Bridge (I²SB) is a generative modeling framework that interpolates between image distributions using entropy-regularized optimal transport.
The approach extends classical diffusion methods by replacing fixed Gaussian priors with arbitrary marginals and jointly optimizing drift and volatility via the Schrödinger–Bass Bridge (SBB).
LightSBB-M leverages closed-form updates and efficient training loops to achieve state-of-the-art performance in synthetic benchmarks and real-world FFHQ image translations.

Image-to-Image Schrödinger Bridge (I²SB) is a class of generative modeling frameworks that learns stochastic dynamics interpolating between two arbitrary data distributions, typically two types of images. I²SB extends classical score-based diffusion by replacing the fixed Gaussian prior with arbitrary marginals, enabling direct domain translation, restoration, and enhancement tasks within a tractable entropy-regularized optimal transport setting. Recent work has further generalized the Schrödinger Bridge by introducing the Schrödinger–Bass Bridge (SBB), which jointly optimizes both drift and volatility components, providing an interpolation between pure drift-driven and pure martingale stochastic transport (Alouadi et al., 27 Jan 2026).

1. Mathematical Foundations of SBB and I²SB

Let $\mu_0,\mu_T$ be distributions over $\mathbb{R}^d$ encoding source and target (e.g., adult and child faces in FFHQ latent space). The SBB seeks a path measure $\mathcal{P}$ over trajectories $X_t$ driven by both drift $\alpha_t$ and volatility $\sigma_t$ : $dX_t = \alpha_t(X_t)\,dt + \sigma_t(X_t)\,dW_t \quad X_0 \sim \mu_0,\ X_T \sim \mu_T$ The joint cost combines drift and volatility penalties: $J(\mathcal{P}) = \mathbb{E}_{\mathcal{P}}\left[ \int_0^T \left( \|\alpha_t\|^2 + \beta \|\sigma_t - \sqrt{\epsilon}I\|^2_F \right) dt \right]$ where $\beta>0$ tunes the drift-volatility trade-off. In the dual, with Lagrange multiplier and PDE arguments, optimizing this cost reduces to solving: $\sup_{v,\psi} \left\{ \int \psi(x) \mu_T(dx) - \int v(0,x)\mu_0(dx) \right\}$ subject to $\partial_tv + H^*_\beta(\nabla v, D^2v) = 0,\ v(T,\cdot) = \psi$ , with the Fenchel–Legendre transform $H^*_\beta(p,q)=\frac{1}{2}|p|^2 + \frac{1}{2}\epsilon\beta\,\operatorname{Tr}[(I - q/\beta)^{-1} - I]$ for $q < D^2v < \beta I$ (Alouadi et al., 27 Jan 2026).

2. Closed-form Drift and Volatility, Role of $\beta$

The dual maximizer $v^*(t,x)$ yields analytic feedback laws:

Drift $\mu_t^*(x)=\nabla_x v^*(t,x)$ .
Volatility $\sigma_t^*(x)=\sqrt{\epsilon}(I - (1/\beta)D^2_xv^*(t,x))^{-1}$ .

Key interpolation regimes:

As $\beta\to\infty$ : volatility freezes to $\sigma_t\approx\sqrt{\epsilon}I$ and the process reduces to classical Schrödinger bridge (deterministic drift).
As $\beta\to 0$ : drift cost diverges, $\mu_t\to 0$ , yielding a pure Bass martingale transport with volatility-only control.

Equivalently, the transport can be recast using a Schrödinger potential $h_T^*$ and a stretching map $\Phi_t(y)=|y|^2/2 + (\epsilon/\beta)\log h_t(y)$ whose Hessian impacts local volatility.

3. LightSBB-M Algorithmic Workflow

The LightSBB-M algorithm computes the SBB transport plan by operating in the nonlinear map space $Y \leftrightarrow X$ . The workflow alternates between:

Learning the score-drift $s_\theta(t,y) \approx \epsilon \nabla_y \log h_t(y)$ using bridge-matching loss.
Learning the transport map $Z_{\tilde{\theta}}(t,x) \approx Y$ via regression.

A typical training loop (Algorithm 1 (Alouadi et al., 27 Jan 2026)):

Initialize map $Y_t^0(x)=x$ .
For $k=0\dots K-1$ $k = 0 \dots K - 1$ outer iterations:
- Sample endpoints $x_0\sim\mu_0,\ x_T\sim\mu_T$ . Compute $y_0=Z_{\tilde{\theta}^k}(0,x_0),\ y_T=Z_{\tilde{\theta}^k}(T,x_T)$ .
- Sample intermediate $y_t$ from Brownian bridge between $y_0$ and $y_T$ .
- Update drift network $\theta$ via minimizing $\mathbb{E}[\|s_\theta(t,y_t)-(y_T-y_t)/(T-t)\|^2]$ .
- Update map $Z_{\tilde{\theta}}$ by regressing $X\mapsto Y$ and matching endpoints at $t=0,T$ .

Closed-form solutions for $s_\theta$ and regression are reached in $\approx 5$ outer iterations.

4. Network Architecture and Training: FFHQ Example

For unpaired adult $\to$ child translation in FFHQ:

Domain distributions $\mu_0,\mu_T$ reside in 512-dim ALAE latent space.
Drift and map networks are parameterized by deep neural nets (typically, residual architectures).
Training proceeds by alternating endpoint and bridge sampling, loss minimization, and network updates.
Hyperparameters are set to optimize 2-Wasserstein distance and empirical generative fidelity.

The algorithm is validated on synthetic benchmarks as well as real FFHQ images. It demonstrates substantial improvements (up to 32%) in 2-Wasserstein distance over classical SB and diffusion-based translation baselines.

5. Inference Pipeline and Quantitative Evaluation

Single-pass inference:

Given a sample $x_0$ in $\mu_0$ , the learned transport map and drift are applied to realize the optimal SBB trajectory toward $\mu_T$ .
For image translation, this is decoded through the latent space to obtain final outputs.

Quantitative and qualitative results:

LightSBB-M achieves state-of-the-art results in both synthetic mixture transport and real-world image translation (e.g., adult $\to$ child faces), outperforming prior SB and diffusion approaches.
Image outputs exhibit preserved identity components, age-relevant structure, and improved local detail due to joint drift and volatility control.

6. Theoretical and Practical Implications

The SBB formulation generalizes the classical Schrödinger Bridge by enabling direct manipulation of both drift (mean dynamics) and stochastic volatility. The analytic dual permits sample-efficient estimation, fast outer convergence, and interpretable control of generative stochasticity. LightSBB-M operationalizes SBB for high-dimensional, latent-space image translation.

Significance includes:

Scalable training, closed-form drift/volatility updates.
Control over generative diversity via the $\beta$ parameter: tuning between deterministic and stochastic transport.
Empirical superiority on both synthetic and real image-to-image tasks, validated by benchmark metrics.
Compatibility with unpaired and paired data, latent representations, and extensibility to other generative transport settings.

The LightSBB-M framework establishes a computationally efficient, theoretically grounded mechanism for scalable unpaired image-to-image translation via joint drift-volatility optimal transport, bridging practical generative modeling and advanced stochastic control (Alouadi et al., 27 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

LightSBB-M: Bridging Schrödinger and Bass for Generative Diffusion Modeling (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Image-to-Image Schrödinger Bridge (I$^2$SB).