Papers
Topics
Authors
Recent
Search
2000 character limit reached

CSBM: Categorical Schrödinger Bridge Matching

Updated 25 April 2026
  • CSBM is a framework for generative modeling in discrete data spaces that constructs minimal-action stochastic flows between prescribed endpoint distributions.
  • It employs both discrete-time iterative Markovian fitting and continuous-time minimal-action learning to bridge and translate data in unpaired settings.
  • CSBM has been applied to image translation in VQ spaces, peptide sequence design, and molecular design, demonstrating its practical utility and robust performance.

Categorical Schrödinger Bridge Matching (CSBM) is a framework for generative modeling and unpaired domain translation in discrete (categorical) data spaces. It generalizes the Schrödinger Bridge (SB) methodology—originally developed for continuous spaces—to problems where each data point is a tuple or sequence of discrete tokens, as found in vector-quantized representations, peptides, and text. CSBM provides both a rigorous mathematical foundation and a practical algorithmic paradigm for constructing minimal-action stochastic flows (bridges) between prescribed endpoint distributions on finite alphabets, leveraging controlled Markov processes and variational inference. Its recent instantiations have targeted sequence generation, image translation in latent spaces of VQ models, and molecular design (2502.01416, Goel et al., 29 Jan 2026).

1. Mathematical Formulation in Discrete State Spaces

The core object of interest is the dynamic discrete Schrödinger Bridge: given finite state space X=SD\mathcal{X} = \mathbb{S}^D (e.g., S\mathbb{S} a token or codebook of size SS), time grid 0=t0<t1<<tN+1=10 = t_0 < t_1 < \ldots < t_{N+1} = 1, and two marginals p0,p1p_0, p_1 on X\mathcal{X}, construct a path measure qq^* that solves

minqΠN(p0,p1)KL(qqref)\min_{q \in \Pi_N(p_0, p_1)} \mathrm{KL}(q \| q^{\mathrm{ref}})

where qrefq^{\mathrm{ref}} is a reference Markov process on X\mathcal{X}, and S\mathbb{S}0 denotes path laws with prescribed endpoints S\mathbb{S}1 (2502.01416). This is equivalent in the two-step case to entropic optimal transport with cost S\mathbb{S}2. In continuous time, as exploited by MadSBM for peptide sequence design, the analogue employs a controlled continuous-time Markov chain (CTMC) on the "edit graph" defined by valid one-token edits, yielding a minimal-action stochastic flow between the prior and data (Goel et al., 29 Jan 2026).

The process is determined by a time-inhomogeneous generator S\mathbb{S}3, where S\mathbb{S}4 is a fixed (reference) generator, and S\mathbb{S}5 is a time-dependent control field. The optimal S\mathbb{S}6 (the minimal-action control) is characterized via a discrete Hamilton–Jacobi–Bellman equation and amounts to a Doob S\mathbb{S}7-transform of the reference process.

2. Algorithmic Approaches: Iterative Markovian Fitting and Minimal-Action Learning

Two principal algorithmic regimes exist within the CSBM literature: discrete-time iterative Markovian fitting (D-IMF) and continuous-time minimal-action learning (as in MadSBM).

Discrete-time D-IMF

The D-IMF procedure (2502.01416) alternates two projections over the space of path measures:

  • Reciprocal projection: Imposes marginals on interior points according to the reciprocal family of the reference process.
  • Markov projection: Projects to the space of Markovian path measures, parametrized via neural Markov chains.

Each outer iteration alternates fitting parametric forward (S\mathbb{S}8) and backward (S\mathbb{S}9) processes through KL-based objectives. The theoretical result (Theorem 3.1) establishes that, in finite SS0, the Schrödinger bridge is uniquely determined as the intersection of the Markov and reciprocal families with D-IMF converging to the unique minimizer.

Minimal-Action Discrete Schrödinger Bridge

In the continuous-time case for sequences (Goel et al., 29 Jan 2026), "Minimal-action discrete Schrödinger Bridge Matching" (MadSBM) parameterizes the control field SS1 with a small Diffusion Transformer network (DiT). Training proceeds via a cross-entropy loss on masked positions, consistent in population with the minimal-action variational objective.

Sampling from the trained model involves forward simulation via a discretized (in time) controlled Markov process, iteratively updating token positions in the sequence using transition probabilities shaped by both the reference generator SS2 and learned control SS3.

3. Reference Processes and Data-Specific Construction

The choice of reference process SS4 or SS5 is critical. Common constructions include:

Reference Process Type Characteristics Applications
Uniform random-walk Equal probability to any neighboring state Proof-of-concept, general-purpose categorical tasks
Gaussian-like in code index Local moves favored via exponential decay in jump Image VQ-space translation
Biologically-informed (LM logits) Reference rates from pretrained LLM Protein/peptide design, chemistry

In both D-IMF and MadSBM, the reference process is chosen to model "cheap" or high-likelihood transitions, thereby helping intermediate states stay within plausible regions of the state space and improving the realism of generated samples (2502.01416, Goel et al., 29 Jan 2026).

4. Practical Parameterization and Sampling

In practice, the Markov kernels SS6 and control fields SS7 are parameterized by neural networks that take the current state (sequence or codebook indices) and time index as input, outputting transition probabilities or logits over the discrete vocabulary. For sequence generation, these are structured as SS8 tensors (for sequence length SS9), leading to efficient vectorized computation.

The general CSBM sampling pseudocode is:

0=t0<t1<<tN+1=10 = t_0 < t_1 < \ldots < t_{N+1} = 13 For MadSBM, sampling involves a time loop from fully masked to fully revealed sequence, with per-token proposal and "nucleus" sampling using filtered softmax over candidate substitutions (Goel et al., 29 Jan 2026).

5. Classifier Guidance and Control Objectives

MadSBM introduces, for the first time, a discrete classifier guidance mechanism for Schrödinger Bridge models (Goel et al., 29 Jan 2026). During sampling, at each time step, multiple candidate transitions are scored by an external classifier (e.g., for binding affinity) and reweighted according to a softmax of classifier scores. One candidate is then resampled proportionally, which tilts the generative process toward high-score regions according to the auxiliary objective, with no retraining required. This is functionally analogous to classifier guidance in continuous diffusion models but adapted to discrete, categorical flows.

6. Empirical Evaluation and Applications

CSBM and its variants have been empirically validated in multiple discrete generative modeling scenarios:

  • Toy 2D distributions: Demonstrates effective mass interpolation and dependence on reference process stochasticity (2502.01416).
  • Image generative modeling in VQ-space: CSBM outperforms continuous-space SB competitors on metrics such as FID and CMMD in unpaired domain translation tasks (e.g., colored MNIST, CelebA) (2502.01416).
  • Peptide sequence design: MadSBM can efficiently generate high-likelihood, chemically plausible peptide candidates by exploiting biologically informed reference dynamics and minimal-action bridges (Goel et al., 29 Jan 2026).

Empirical results confirm that staying within the support of plausible transitions (via discrete bridges) provides semantic consistency and high-quality samples. Limitations include the need for large 0=t0<t1<<tN+1=10 = t_0 < t_1 < \ldots < t_{N+1} = 10 (number of time steps) to match continuous models’ expressivity and simplifications in factorized transition parameterization that neglect within-step dependencies.

7. Limitations, Current Challenges, and Extensions

CSBM's primary limitations stem from the factorization assumption over 0=t0<t1<<tN+1=10 = t_0 < t_1 < \ldots < t_{N+1} = 11 positions, which, while common in discrete diffusion methods, can restrict modeling power in high-dimensional or structured data. There is an explicit trade-off between the number of transport steps 0=t0<t1<<tN+1=10 = t_0 < t_1 < \ldots < t_{N+1} = 12 (which increases computational complexity) and the model’s capacity to preserve fidelity. Current research directions include:

  • Designing richer parameterizations (autoregressive, attention-based) for transition kernels.
  • Developing discrete SB bridges in continuous time to improve sample path realism.
  • Faster samplers for large categorical spaces.
  • Theoretical analysis of convergence rates in high dimensions.
  • Extensions to non-homogeneous alphabets or structured categorical products (e.g., text, molecules, graphs).

A plausible implication is that further advances in scalable, structured discrete bridge matching could provide a universal framework for generative modeling in non-continuous domains, unifying approaches across statistical physics, optimal transport, and stochastic control.

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Categorical Schrödinger Bridge Matching (CSBM).