Image-to-Image Schrödinger Bridge (I²SB)
- Image-to-Image Schrödinger Bridge (I²SB) is a generative modeling framework that interpolates between image distributions using entropy-regularized optimal transport.
- The approach extends classical diffusion methods by replacing fixed Gaussian priors with arbitrary marginals and jointly optimizing drift and volatility via the Schrödinger–Bass Bridge (SBB).
- LightSBB-M leverages closed-form updates and efficient training loops to achieve state-of-the-art performance in synthetic benchmarks and real-world FFHQ image translations.
Image-to-Image Schrödinger Bridge (I²SB) is a class of generative modeling frameworks that learns stochastic dynamics interpolating between two arbitrary data distributions, typically two types of images. I²SB extends classical score-based diffusion by replacing the fixed Gaussian prior with arbitrary marginals, enabling direct domain translation, restoration, and enhancement tasks within a tractable entropy-regularized optimal transport setting. Recent work has further generalized the Schrödinger Bridge by introducing the Schrödinger–Bass Bridge (SBB), which jointly optimizes both drift and volatility components, providing an interpolation between pure drift-driven and pure martingale stochastic transport (Alouadi et al., 27 Jan 2026).
1. Mathematical Foundations of SBB and I²SB
Let be distributions over encoding source and target (e.g., adult and child faces in FFHQ latent space). The SBB seeks a path measure over trajectories driven by both drift and volatility : The joint cost combines drift and volatility penalties: where tunes the drift-volatility trade-off. In the dual, with Lagrange multiplier and PDE arguments, optimizing this cost reduces to solving: subject to , with the Fenchel–Legendre transform for (Alouadi et al., 27 Jan 2026).
2. Closed-form Drift and Volatility, Role of
The dual maximizer yields analytic feedback laws:
- Drift .
- Volatility .
Key interpolation regimes:
- As : volatility freezes to and the process reduces to classical Schrödinger bridge (deterministic drift).
- As : drift cost diverges, , yielding a pure Bass martingale transport with volatility-only control.
Equivalently, the transport can be recast using a Schrödinger potential and a stretching map whose Hessian impacts local volatility.
3. LightSBB-M Algorithmic Workflow
The LightSBB-M algorithm computes the SBB transport plan by operating in the nonlinear map space . The workflow alternates between:
- Learning the score-drift using bridge-matching loss.
- Learning the transport map via regression.
A typical training loop (Algorithm 1 (Alouadi et al., 27 Jan 2026)):
- Initialize map .
- For outer iterations:
- Sample endpoints . Compute .
- Sample intermediate from Brownian bridge between and .
- Update drift network via minimizing .
- Update map by regressing and matching endpoints at .
Closed-form solutions for and regression are reached in outer iterations.
4. Network Architecture and Training: FFHQ Example
For unpaired adultchild translation in FFHQ:
- Domain distributions reside in 512-dim ALAE latent space.
- Drift and map networks are parameterized by deep neural nets (typically, residual architectures).
- Training proceeds by alternating endpoint and bridge sampling, loss minimization, and network updates.
- Hyperparameters are set to optimize 2-Wasserstein distance and empirical generative fidelity.
The algorithm is validated on synthetic benchmarks as well as real FFHQ images. It demonstrates substantial improvements (up to 32%) in 2-Wasserstein distance over classical SB and diffusion-based translation baselines.
5. Inference Pipeline and Quantitative Evaluation
Single-pass inference:
- Given a sample in , the learned transport map and drift are applied to realize the optimal SBB trajectory toward .
- For image translation, this is decoded through the latent space to obtain final outputs.
Quantitative and qualitative results:
- LightSBB-M achieves state-of-the-art results in both synthetic mixture transport and real-world image translation (e.g., adultchild faces), outperforming prior SB and diffusion approaches.
- Image outputs exhibit preserved identity components, age-relevant structure, and improved local detail due to joint drift and volatility control.
6. Theoretical and Practical Implications
The SBB formulation generalizes the classical Schrödinger Bridge by enabling direct manipulation of both drift (mean dynamics) and stochastic volatility. The analytic dual permits sample-efficient estimation, fast outer convergence, and interpretable control of generative stochasticity. LightSBB-M operationalizes SBB for high-dimensional, latent-space image translation.
Significance includes:
- Scalable training, closed-form drift/volatility updates.
- Control over generative diversity via the parameter: tuning between deterministic and stochastic transport.
- Empirical superiority on both synthetic and real image-to-image tasks, validated by benchmark metrics.
- Compatibility with unpaired and paired data, latent representations, and extensibility to other generative transport settings.
The LightSBB-M framework establishes a computationally efficient, theoretically grounded mechanism for scalable unpaired image-to-image translation via joint drift-volatility optimal transport, bridging practical generative modeling and advanced stochastic control (Alouadi et al., 27 Jan 2026).