Papers
Topics
Authors
Recent
Search
2000 character limit reached

Latent Schrödinger Bridge Models

Updated 3 May 2026
  • Latent Schrödinger Bridge Models are generative frameworks that transform samples between probability distributions via entropic optimal transport in a learned latent space.
  • They combine encoder–decoder architectures with stochastic differential equations and neural solvers such as score-based diffusion and neural ODEs for efficient, high-dimensional modeling.
  • Empirical results demonstrate state-of-the-art performance in tasks like 3D shape completion and image synthesis with significant improvements in speed and parameter efficiency.

Latent Schrödinger Bridge Models are a class of generative modeling and optimal transport frameworks that formulate the problem of transforming samples from one probability distribution to another as an entropic optimal transport, realized in a learned low-dimensional latent space. The core principle is to explicitly model the globally optimal stochastic dynamics (“bridge”) that couple origin and target distributions by minimizing the Kullback–Leibler divergence to a reference stochastic process, typically a Brownian motion or reference diffusion. By learning these dynamics in the latent space of a neural encoder–decoder, these models simultaneously benefit from computational tractability, improved sample quality, and rigorous theoretical guarantees for high-dimensional data. Recent architectures unify Schrödinger bridge theory, deep score-based diffusion, and variational latent compression, spanning diverse applications such as 3D shape completion, image synthesis, and latent-space optimal transport (Kong et al., 29 Jun 2025, Jiao et al., 2024, Khilchuk et al., 14 Dec 2025).

1. The Schrödinger Bridge Formulation in Latent Space

The dynamic Schrödinger bridge problem seeks a stochastic process (zt)t[0,T](z_t)_{t\in[0,T]} whose endpoints marginally realize two prescribed distributions π0\pi_0 and π1\pi_1 (e.g., corresponding to complete and incomplete data), while remaining minimal in relative entropy to a reference process, typically a diffusion. In latent space, this is formalized as: P=argminP:Pt=0=π0,Pt=T=π1KL(PQ),\mathbb{P}^* = \arg\min_{\mathbb{P} : \mathbb{P}_{t=0}=\pi_0,\,\mathbb{P}_{t=T}=\pi_1} \mathrm{KL}(\mathbb{P}\|\mathbb{Q}), where Q\mathbb{Q} is the law of a reference SDE, such as

dzt=f(zt,t)dt+g(t)dWtdz_t = f(z_t,t)\,dt + g(t)\,dW_t

with drift ff and diffusion gg. The solution P\mathbb{P}^* induces a forward SDE of the form

dzt=[f(zt,t)+g2(t)logΨt(zt)]dt+g(t)dWtdz_t = \bigl[ f(z_t,t) + g^2(t)\nabla \log\Psi_t(z_t) \bigr]dt + g(t)\,dW_t

along with a coupled backward SDE. The functions π0\pi_00 solve Schrödinger-type PDEs with endpoint constraints π0\pi_01 (Kong et al., 29 Jun 2025, Jiao et al., 2024).

This construction is equivalent to entropic optimal transport, regularizing the classical Monge–Kantorovich problem by penalizing deviations from a stochastic reference path via the path-space KL divergence.

2. Latent Representations: Encoder–Decoder Architectures

Latent Schrödinger bridge models operate in a learned latent representation, induced by a neural autoencoder or variational autoencoder (VAE). Data π0\pi_02 is mapped to a lower-dimensional code π0\pi_03: π0\pi_04 where π0\pi_05 is the ambient dimension. In “BridgeShape,” a vector-quantized VAE (VQ-VAE) equipped with depth-enhanced features encodes high-resolution 3D shapes into a structured latent grid, maximizing geometric fidelity and compressibility (Kong et al., 29 Jun 2025). The latent space distributions π0\pi_06 (complete) and π0\pi_07 (incomplete/partial) are constructed by encoding datasets of paired data. Encoder–decoder pre-training is performed via MSE reconstruction loss: π0\pi_08 Theoretical results guarantee that, under compression regularity, the end-to-end reconstruction error decays as a function of pre-training dataset size and latent dimension (Jiao et al., 2024).

3. Algorithms: Neural, Symbolic, and Hybrid Solvers

Latent SB models support several algorithmic paradigms for solving the entropic transport:

  • Neural Score-Based Diffusion: A neural network π0\pi_09 is trained to parameterize conditional noise in the Gaussian bridge, using score matching over paired endpoint latent codes and intermediate noisy latents (Kong et al., 29 Jun 2025). The training objective is

π1\pi_10

with the latent bridge posterior π1\pi_11 given in closed form.

  • Neural ODE Surrogates: The continuous-time bridge drift is parameterized as a neural ODE vector field,

π1\pi_12

trained via iterative matching to bridge velocities derived from the SDE and endpoint interpolation strategies (Khilchuk et al., 14 Dec 2025). Both forward and backward ODEs are learned, offering superior control over sampling and computational efficiency.

  • Symbolic SINDy Flow Matching: For low-dimensional or nearly Gaussian latent spaces, the bridge dynamics can be represented by a sparse symbolic regression model,

π1\pi_13

where π1\pi_14 is a polynomial feature library and π1\pi_15 fitted via π1\pi_16-regularized least squares. This reduction yields interpretable, efficient models with orders-of-magnitude fewer parameters and near-instantaneous inference (Khilchuk et al., 14 Dec 2025).

Comparison Table: Key Latent SB Algorithms

Algorithm Expressivity Sample Efficiency Interpretability
Neural Diffusion Arbitrary Moderate Black-box
Neural ODE High (continuous) High Moderate
SINDy-FM Limited (polynomial) Very high Explicit/Symbolic

4. Training Procedures and Architectures

Comprehensive recipes for latent SB training are available. The two-stage regime is common:

  • Stage I: Pre-train the latent autoencoder (VAE or VQ-VAE) on a large dataset, only using full data (e.g., complete 3D shapes), freezing the encoder and decoder afterward. For depth-enhanced 3D tasks, multi-view rendering with DINOv2 features and cross-attention fusion are used in the encoder (Kong et al., 29 Jun 2025).
  • Stage II: Train the bridge model (neural diffusion, ODE, or symbolic) in the latent space. Endpoint pairs π1\pi_17 are sampled (for conditional tasks, partial data is encoded), and the model is optimized using either score-matching regression or direct flow-matching.

For practical efficiency, BridgeShape applies Gaussian based bridge posteriors, enabling sampling in three steps—representing a significant reduction in inference time compared to standard DDPM pipelines which require hundreds of steps (Kong et al., 29 Jun 2025).

5. Theoretical Guarantees and Convergence

A distinguishing feature of latent SB models is the end-to-end theoretical analysis for distributional approximation. The error between generated and target data distributions, measured in Wasserstein-2 distance, decomposes as

π1\pi_18

Crucially, the dominant convergence rate scales only with the dimension of the latent space π1\pi_19, yielding

P=argminP:Pt=0=π0,Pt=T=π1KL(PQ),\mathbb{P}^* = \arg\min_{\mathbb{P} : \mathbb{P}_{t=0}=\pi_0,\,\mathbb{P}_{t=T}=\pi_1} \mathrm{KL}(\mathbb{P}\|\mathbb{Q}),0

where P=argminP:Pt=0=π0,Pt=T=π1KL(PQ),\mathbb{P}^* = \arg\min_{\mathbb{P} : \mathbb{P}_{t=0}=\pi_0,\,\mathbb{P}_{t=T}=\pi_1} \mathrm{KL}(\mathbb{P}\|\mathbb{Q}),1 is the number of grid steps, P=argminP:Pt=0=π0,Pt=T=π1KL(PQ),\mathbb{P}^* = \arg\min_{\mathbb{P} : \mathbb{P}_{t=0}=\pi_0,\,\mathbb{P}_{t=T}=\pi_1} \mathrm{KL}(\mathbb{P}\|\mathbb{Q}),2 is the domain-shift error (data distribution mismatch), and P=argminP:Pt=0=π0,Pt=T=π1KL(PQ),\mathbb{P}^* = \arg\min_{\mathbb{P} : \mathbb{P}_{t=0}=\pi_0,\,\mathbb{P}_{t=T}=\pi_1} \mathrm{KL}(\mathbb{P}\|\mathbb{Q}),3 the encoder–decoder error. This result demonstrates that latent SB models can avoid the curse of dimensionality inherent to data-space diffusion, provided the latent manifold is sufficiently compact (Jiao et al., 2024).

6. Empirical Performance and Practical Impact

BridgeShape and related methods demonstrate state-of-the-art results in 3D shape completion and generative translation tasks. On 3D-EPN and PatchComplete, BridgeShape significantly outperforms prior methods in L1/TUDF grid error, Chamfer Distance, and volumetric IoU, improving both known and unseen categories. Resolution scaling directly translates to continued accuracy gains, with efficient inference enabled by latent-bridge sampling (three reverse steps, 0.04 s total, compared to 100+ in DDPM-based baselines) (Kong et al., 29 Jun 2025).

For latent translation on MNIST, SINDy-FM achieves similar FID and Inception scores compared to neural ODE surrogates, with dramatic reductions in parameter count and computation time (100P=argminP:Pt=0=π0,Pt=T=π1KL(PQ),\mathbb{P}^* = \arg\min_{\mathbb{P} : \mathbb{P}_{t=0}=\pi_0,\,\mathbb{P}_{t=T}=\pi_1} \mathrm{KL}(\mathbb{P}\|\mathbb{Q}),4 faster inference, 300P=argminP:Pt=0=π0,Pt=T=π1KL(PQ),\mathbb{P}^* = \arg\min_{\mathbb{P} : \mathbb{P}_{t=0}=\pi_0,\,\mathbb{P}_{t=T}=\pi_1} \mathrm{KL}(\mathbb{P}\|\mathbb{Q}),5 fewer parameters), while producing visually coherent samples. Neural ODE surrogates offer improved flexibility for more complex latent transport (Khilchuk et al., 14 Dec 2025).

7. Recommendations and Future Directions

Selection of the bridge solver should be matched to the geometry of the latent manifold. For nearly Gaussian latent spaces or when interpretability and low latency are paramount, symbolic surrogates (SINDy-FM) are optimal. For highly nonlinear or complex latent structures, neural ODE surrogates maintain expressivity with competitive efficiency. Pretraining the bridge drift on reference diffusion stabilizes learning in all scenarios. Hybrid (symbolic+neural) schemes can combine interpretability and expressive power. Further research is warranted on direct construction of latent encoders for arbitrary data modalities and further improving error analysis for non-Gaussian latent distributions (Khilchuk et al., 14 Dec 2025, Jiao et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Latent Schrödinger Bridge Models.