CSBM: Categorical Schrödinger Bridge Matching
- CSBM is a framework for generative modeling in discrete data spaces that constructs minimal-action stochastic flows between prescribed endpoint distributions.
- It employs both discrete-time iterative Markovian fitting and continuous-time minimal-action learning to bridge and translate data in unpaired settings.
- CSBM has been applied to image translation in VQ spaces, peptide sequence design, and molecular design, demonstrating its practical utility and robust performance.
Categorical Schrödinger Bridge Matching (CSBM) is a framework for generative modeling and unpaired domain translation in discrete (categorical) data spaces. It generalizes the Schrödinger Bridge (SB) methodology—originally developed for continuous spaces—to problems where each data point is a tuple or sequence of discrete tokens, as found in vector-quantized representations, peptides, and text. CSBM provides both a rigorous mathematical foundation and a practical algorithmic paradigm for constructing minimal-action stochastic flows (bridges) between prescribed endpoint distributions on finite alphabets, leveraging controlled Markov processes and variational inference. Its recent instantiations have targeted sequence generation, image translation in latent spaces of VQ models, and molecular design (2502.01416, Goel et al., 29 Jan 2026).
1. Mathematical Formulation in Discrete State Spaces
The core object of interest is the dynamic discrete Schrödinger Bridge: given finite state space (e.g., a token or codebook of size ), time grid , and two marginals on , construct a path measure that solves
where is a reference Markov process on , and 0 denotes path laws with prescribed endpoints 1 (2502.01416). This is equivalent in the two-step case to entropic optimal transport with cost 2. In continuous time, as exploited by MadSBM for peptide sequence design, the analogue employs a controlled continuous-time Markov chain (CTMC) on the "edit graph" defined by valid one-token edits, yielding a minimal-action stochastic flow between the prior and data (Goel et al., 29 Jan 2026).
The process is determined by a time-inhomogeneous generator 3, where 4 is a fixed (reference) generator, and 5 is a time-dependent control field. The optimal 6 (the minimal-action control) is characterized via a discrete Hamilton–Jacobi–Bellman equation and amounts to a Doob 7-transform of the reference process.
2. Algorithmic Approaches: Iterative Markovian Fitting and Minimal-Action Learning
Two principal algorithmic regimes exist within the CSBM literature: discrete-time iterative Markovian fitting (D-IMF) and continuous-time minimal-action learning (as in MadSBM).
Discrete-time D-IMF
The D-IMF procedure (2502.01416) alternates two projections over the space of path measures:
- Reciprocal projection: Imposes marginals on interior points according to the reciprocal family of the reference process.
- Markov projection: Projects to the space of Markovian path measures, parametrized via neural Markov chains.
Each outer iteration alternates fitting parametric forward (8) and backward (9) processes through KL-based objectives. The theoretical result (Theorem 3.1) establishes that, in finite 0, the Schrödinger bridge is uniquely determined as the intersection of the Markov and reciprocal families with D-IMF converging to the unique minimizer.
Minimal-Action Discrete Schrödinger Bridge
In the continuous-time case for sequences (Goel et al., 29 Jan 2026), "Minimal-action discrete Schrödinger Bridge Matching" (MadSBM) parameterizes the control field 1 with a small Diffusion Transformer network (DiT). Training proceeds via a cross-entropy loss on masked positions, consistent in population with the minimal-action variational objective.
Sampling from the trained model involves forward simulation via a discretized (in time) controlled Markov process, iteratively updating token positions in the sequence using transition probabilities shaped by both the reference generator 2 and learned control 3.
3. Reference Processes and Data-Specific Construction
The choice of reference process 4 or 5 is critical. Common constructions include:
| Reference Process Type | Characteristics | Applications |
|---|---|---|
| Uniform random-walk | Equal probability to any neighboring state | Proof-of-concept, general-purpose categorical tasks |
| Gaussian-like in code index | Local moves favored via exponential decay in jump | Image VQ-space translation |
| Biologically-informed (LM logits) | Reference rates from pretrained LLM | Protein/peptide design, chemistry |
In both D-IMF and MadSBM, the reference process is chosen to model "cheap" or high-likelihood transitions, thereby helping intermediate states stay within plausible regions of the state space and improving the realism of generated samples (2502.01416, Goel et al., 29 Jan 2026).
4. Practical Parameterization and Sampling
In practice, the Markov kernels 6 and control fields 7 are parameterized by neural networks that take the current state (sequence or codebook indices) and time index as input, outputting transition probabilities or logits over the discrete vocabulary. For sequence generation, these are structured as 8 tensors (for sequence length 9), leading to efficient vectorized computation.
The general CSBM sampling pseudocode is:
3 For MadSBM, sampling involves a time loop from fully masked to fully revealed sequence, with per-token proposal and "nucleus" sampling using filtered softmax over candidate substitutions (Goel et al., 29 Jan 2026).
5. Classifier Guidance and Control Objectives
MadSBM introduces, for the first time, a discrete classifier guidance mechanism for Schrödinger Bridge models (Goel et al., 29 Jan 2026). During sampling, at each time step, multiple candidate transitions are scored by an external classifier (e.g., for binding affinity) and reweighted according to a softmax of classifier scores. One candidate is then resampled proportionally, which tilts the generative process toward high-score regions according to the auxiliary objective, with no retraining required. This is functionally analogous to classifier guidance in continuous diffusion models but adapted to discrete, categorical flows.
6. Empirical Evaluation and Applications
CSBM and its variants have been empirically validated in multiple discrete generative modeling scenarios:
- Toy 2D distributions: Demonstrates effective mass interpolation and dependence on reference process stochasticity (2502.01416).
- Image generative modeling in VQ-space: CSBM outperforms continuous-space SB competitors on metrics such as FID and CMMD in unpaired domain translation tasks (e.g., colored MNIST, CelebA) (2502.01416).
- Peptide sequence design: MadSBM can efficiently generate high-likelihood, chemically plausible peptide candidates by exploiting biologically informed reference dynamics and minimal-action bridges (Goel et al., 29 Jan 2026).
Empirical results confirm that staying within the support of plausible transitions (via discrete bridges) provides semantic consistency and high-quality samples. Limitations include the need for large 0 (number of time steps) to match continuous models’ expressivity and simplifications in factorized transition parameterization that neglect within-step dependencies.
7. Limitations, Current Challenges, and Extensions
CSBM's primary limitations stem from the factorization assumption over 1 positions, which, while common in discrete diffusion methods, can restrict modeling power in high-dimensional or structured data. There is an explicit trade-off between the number of transport steps 2 (which increases computational complexity) and the model’s capacity to preserve fidelity. Current research directions include:
- Designing richer parameterizations (autoregressive, attention-based) for transition kernels.
- Developing discrete SB bridges in continuous time to improve sample path realism.
- Faster samplers for large categorical spaces.
- Theoretical analysis of convergence rates in high dimensions.
- Extensions to non-homogeneous alphabets or structured categorical products (e.g., text, molecules, graphs).
A plausible implication is that further advances in scalable, structured discrete bridge matching could provide a universal framework for generative modeling in non-continuous domains, unifying approaches across statistical physics, optimal transport, and stochastic control.
References
- "Categorical Schrödinger Bridge Matching" (2502.01416)
- "Minimal-Action Discrete Schrödinger Bridge Matching for Peptide Sequence Design" (Goel et al., 29 Jan 2026)