Categorical Flow Maps (CFMs)

Updated 15 February 2026

Categorical Flow Maps (CFMs) are continuous-time generative frameworks that transport probability distributions on the simplex to model high-dimensional categorical data efficiently.
They employ ODE-based flows and variational matching to enable few-step or one-step sampling with state-of-the-art performance across benchmarks.
CFMs integrate self-distillation and endpoint consistency objectives to stabilize training and support diverse applications in images, text, and molecular graphs.

Categorical Flow Maps (CFMs) are continuous-time generative modeling frameworks that enable fast, stable, and sample-efficient generation of high-dimensional categorical data. CFMs systematically transport probability distributions supported on the probability simplex to match empirical categorical distributions, leveraging a geometric, variational, and algorithmically tractable formulation that is compatible with modern self-distillation and endpoint consistency techniques. These models have demonstrated state-of-the-art performance in scenarios that demand few-step or single-step generation and are applicable to discrete domains such as images, molecular graphs, and text (Roos et al., 12 Feb 2026).

1. Mathematical Formulation

CFMs define a continuous relaxation of discrete categorical data by embedding each categorical variable as a one-hot vector in the probability simplex: $\Delta^{K-1} = \bigl\{p\in\mathbb{R}^{K}: p_k\ge0,\, \sum_{k=1}^K p_k=1\bigr\}$ Given $D$ variables, each data point is $x\in(\Delta^{K-1})^D$ . Generation proceeds by transporting samples from a continuous prior $p_0$ (e.g., Gaussian, uniform on simplex) to a data distribution $p_1$ supported on one-hot vectors.

Sampling is formulated as an ODE parameterized by a time-dependent vector field. Linear stochastic interpolants define the intermediate marginals: $I_t(x_0, x_1) = (1-t)x_0 + t x_1, \quad t\in[0,1]$ This induces the probability flow: $\frac{dx_t}{dt} = b_t(x_t), \quad b_t(x) = \frac{\mu_t(x) - x}{1-t}, \quad \mu_t(x)=\mathbb{E}[x_1\mid I_t=x]$

The variational flow matching approach parameterizes the conditional posterior $q_t^\theta(x_1|x_t)$ as a product of categorical distributions with simplex-valued outputs: $q_t^\theta(x_1 \mid x_t) = \prod_{d=1}^D \mathrm{Cat}(x_1^{(d)} \mid \pi_t^\theta(x_t)^{(d)})$ The loss is the time-averaged negative log-likelihood under $q^\theta$ , i.e.,

$\mathcal{L}_{\mathrm{inf}}(\theta) = -\mathbb{E}_{t, x_0, x_1}[\log q_t^\theta(x_1 \mid x_t)]$

The drift field recovers the true velocity when the variational posterior matches the conditional: $b_t^\theta(x) = \frac{\pi_t^\theta(x) - x}{1-t}$

To facilitate accelerated (few-step) generation, CFMs also parameterize explicit endpoint-consistent flow maps between intervals $[s, t]$ : $X_{s, t}(x_s) = x_s + \frac{t-s}{1-s} \big(\pi_{s,t}^\theta(x_s) - x_s \big)$ This form is crucial for enabling large time steps without loss of sample validity or diversity (Roos et al., 12 Feb 2026).

2. Self-Distillation and Consistency Objectives

CFMs integrate self-distillation objectives, such as Lagrangian self-distillation and endpoint consistency, to stabilize and accelerate training:

Lagrangian Self-Distillation ( $\mathcal{L}_{\mathrm{CSD}}$ ): Encourages consistency between the time derivative of the learned two-time flow map and the instantaneous velocity:

$\mathcal{L}_{\mathrm{CSD}}(\theta) = \mathbb{E}_{s<t, x_s}\|\partial_t X_{s,t}^{\theta}(x_s) - v_{t,t}^\theta(X_{s,t}^{\theta}(x_s))\|^2$

Endpoint Consistency Distillation (ECLD): Bounds the consistency loss via KL/cross-entropy on simplex-valued outputs:

$\mathcal{L}_{\mathrm{ECLD}}(\theta) = 4\mathcal{L}_{\mathrm{CE-EC}} + 2\mathcal{L}_{\mathrm{TD}}$

where $\mathcal{L}_{\mathrm{CE-EC}}$ is a cross-entropy between teacher and student endpoint predictions, and $\mathcal{L}_{\mathrm{TD}}$ is a regularizer on the temporal drift.

The full loss is a weighted sum of endpoint-inference and distillation/self-consistency terms.

3. Algorithmic Implementation

CFMs support high-throughput, few-step, and one-step sampling:

Training: Batches alternate between classical variational endpoint-inference steps (cross-entropy on data points reached at $t=1$ ) and self-distillation batches over $(s,t)$ intervals. The network is conditioned on $(s,t)$ , and all simplex constraints are enforced via softmax activations.
Sampling: At inference, the flow map $X_{t_i, t_{i+1}}$ is evaluated for a small predefined schedule $\{t_i\}$ , updating the sample as:

$x_{i+1} = x_i + \frac{t_{i+1}-t_i}{1-t_i}(\pi_{t_i, t_{i+1}}^\theta(x_i) - x_i)$

One-hot vectors are recovered via $\arg\max$ at the final step.

Conditional/Guided Sampling: Arbitrary differentiable reward functions $r(x)$ can be imposed at test time by augmenting the drift field with reward gradients, enabling flexible downstream control compatible with Sequential Monte Carlo or straight-through estimation.
Network Design: CFMs employ architectures adapted to domain modalities, such as U-Nets for images, graph transformers for molecular graphs, and DiT-style transformers for text (Roos et al., 12 Feb 2026).

4. Geometry and Theoretical Guarantees

CFMs are geometrically grounded in the structure of the simplex:

The simplex geometry is respected either by direct parametrization (simplex-valued outputs at every stage) or via geometric transforms (e.g., isometric logratio (ILR) or centered stick-breaking) from the simplex to $\mathbb{R}^D$ , which ensures isometricity (preservation of Aitchison inner products), smooth invertibility, and numerically stable flows (Williams et al., 31 Oct 2025).
Dequantization via Dirichlet interpolation allows boundary (one-hot) observations to be incorporated as interior simplex points, while still permitting exact recovery of discrete samples.
Alternative geometric approaches, e.g., Statistical Flow Matching, leverage the Fisher information metric and Riemannian geodesic flows, which further connect CFM methodology to natural gradient flows and optimal transport machinery (Cheng et al., 2024).
Empirical and theoretical results confirm that these geometric mappings guarantee the recovery of discrete categorical samples, maintain bounded total-variation error, and accommodate exact density computation upon appropriate change-of-variables (Williams et al., 31 Oct 2025, Cheng et al., 2024).

5. Connections and Variants

CFMs form a unifying framework for a diverse landscape of discrete generative techniques:

Variational Flow Matching (VFM): The mean-field VFM formulation casts flows as variational permutations of discrete endpoints and underlies CatFlow, which casts matching as categorical cross-entropy loss over predicted endpoints (Eijkelboom et al., 2024).
SimplexFlow: Embeds categorical variables in the simplex and constrains ODE dynamics to the affine hyperplane, but is sensitive to the choice of prior; in practice, unconstrained (Gaussian) embeddings can achieve equal or better molecular validity (Dunn et al., 2024).
Discrete Diffusions and Statistical FM: CFMs relate to score-based and diffusion models for categorical data, but achieve computational and statistical efficiency via ODE-based, continuous, simplex-respecting flows rather than stochastic or combinatorial trajectories (Cheng et al., 2024, Roos et al., 12 Feb 2026).
Category-Theoretic CFMs: In a distinct lineage, categorical flow maps are also defined as natural transformations between stock-flow diagrams in systems modeling, equipped with algebraic properties (symmetric monoidal structure, limits/colimits) for modular model composition (Baez et al., 2022).

6. Empirical Performance and Applications

CFMs deliver state-of-the-art results for few-step categorical generation in diverse benchmarks:

Benchmark	Metric	CFM Result
QM9 (molecular graphs)	1-step validity	95.8%
ZINC (molecular graphs)	1-step validity	93.5%
Binary MNIST	1-step FID	10.1
Text8 (text generation)	1-step NLL	5.33 (GPT-J-6B measured)
Binarized MNIST (continuous)	NLL (SB–CFM)	0.0341±0.0006
DNA promoter sequences	SP-MSE (SB–CFM)	0.0214

Few-step (typically 1–4 steps) generation matches or surpasses prior methods requiring orders of magnitude more function evaluations (Roos et al., 12 Feb 2026, Eijkelboom et al., 2024).
Experimental ablations confirm that geometry-aware endpoint parametrizations are essential for robust one-step sampling without mode collapse or sample invalidity.
Applications include molecular graph and sequence generation, image modeling, accelerated discrete text completion, and combinatorial structure synthesis (e.g., graphs, code) (Roos et al., 12 Feb 2026, Williams et al., 31 Oct 2025, Eijkelboom et al., 2024).
Flexibility at inference enables conditional or reward-driven sampling for property-guided generation in molecular design or controlled autoregressive decoding in LLMs.

7. Limitations and Extensions

The endpoint two-time parametrization increases model complexity by doubling temporal conditioning; further, temporal-drift and entropy regularizers require tuning for optimal stability. Prior specification and simplex constraint enforcement remain critical to maximizing coverage and sample quality, especially on datasets with high discrete arity. Practitioners must also choose between simplex-respecting embedding approaches and alternative Euclidean relaxations, as empirical coverage may depend on data geometry (Dunn et al., 2024).

Potential future directions include:

Extension to general constrained domains through geometry-aware parametrization.
Design of equivariant architectures for structured data.
Application to tabular, set-based, and combinatorial optimization problems.

CFMs thus provide a theoretically principled, empirically validated, and algorithmically versatile approach for categorical generative modeling in both artificial intelligence and scientific applications (Roos et al., 12 Feb 2026, Williams et al., 31 Oct 2025, Eijkelboom et al., 2024, Dunn et al., 2024, Cheng et al., 2024).