Discrete Classifier-Based Guidance (D-CBG)

Updated 1 April 2026

D-CBG is a family of algorithms for conditional sampling in discrete diffusion and flow models that combines an unconditional denoiser with classifier guidance.
It employs Bayes-inspired reweighting of discrete Markov chain transitions using likelihood ratios from a classifier and a temperature hyperparameter to modulate control strength.
D-CBG enables efficient, state-of-the-art conditional generation across domains such as nucleotide sequences, molecule design, and image discretization without retraining the base model.

Discrete Classifier-Based Guidance (D-CBG) is a family of algorithms for conditional sampling in discrete diffusion and flow models, enabling controllable generative modeling of categorical-valued data such as nucleotide sequences, molecules, and discretized images. D-CBG generalizes the continuous-state classifier guidance principle to discrete state spaces, combining an unconditional discrete denoiser or flow with a (possibly time-dependent) classifier to direct generation toward desired attributes or labels. The core mechanism adapts Bayes-inspired guidance to discrete Markov chains, employing reweighting of transition probabilities or rates by the likelihood ratio of the target property under a classifier, with a temperature hyperparameter modulating control strength. D-CBG is computationally efficient, requiring no retraining of the base generator, and achieves state-of-the-art conditional control across diverse discrete domains (Schiff et al., 2024, Nisonoff et al., 2024, Ma et al., 2023).

1. Discrete Diffusion and Flow: Model Setup

In D-CBG, data consists of sequences $x \in V^L$ over a vocabulary of size $N$ , with each token represented in a one-hot encoding. Discrete diffusion models are constructed as forward Markov chains, where each time-step transition is defined by a categorical distribution parameterized by a transition matrix $Q_{t|t-1}$ : $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})$ A common “interpolating” schedule linearly mixes the input state with the uniform distribution, controlled by a time-varying parameter $\alpha_t$ .

The goal of generative modeling is to learn a reverse process approximating the true posterior $q(x_{t-1} \mid x_t)$ , which, due to intractable dependence on the original uncorrupted data, is parameterized by a neural denoiser $p_\theta(x_{t-1} \mid x_t)$ trained via a variational objective over the trajectory. For continuous-time settings, models define a time-dependent rate/generator matrix $R_t$ for the continuous-time Markov chain (CTMC) on the discrete state space (Nisonoff et al., 2024).

2. Principle and Mathematical Formulation of D-CBG

D-CBG augments unconditional sampling to generate samples from a conditional distribution (e.g., with specified class $y$ or property $c$ ). The guidance is implemented by combining the base model with a separately trained classifier. Let $N$ 0 be the classifier output for attribute $N$ 1: $N$ 2 Here, the guidance strength $N$ 3 amplifies conditioning. In continuous-time settings, transition rates in the CTMC are reweighted for each possible state transition $N$ 4 as: $N$ 5 This guided process preserves ergodicity and ensures that, in the limit, samples are biased according to the specified attribute (Nisonoff et al., 2024, Schiff et al., 2024).

3. Efficient Implementation: Tokenwise and Taylor Approximations

Naïvely, evaluating the classifier for all candidate state replacements scales as $N$ 6 for sequences (or $N$ 7 for general discrete objects), quickly becoming prohibitive for long sequences or extensive vocabularies. D-CBG exploits a first-order Taylor expansion, treating the classifier log-probability as a smooth function of the input: $N$ 8 Since $N$ 9 and $Q_{t|t-1}$ 0 differ in only one token, the entire batch of guided probabilities for all candidate mutations can be computed with a single forward and backward pass through the classifier. This enables practical application to high-dimensional biosequence and molecule spaces (Schiff et al., 2024, Nisonoff et al., 2024).

A standard pseudocode loop for the discrete-time case consists of, for each step and each token: computing denoiser logits, one classifier forward-backward for the current $Q_{t|t-1}$ 1, constructing the guided logits, renormalizing with softmax, and sampling the next token (see Section 4 below).

4. Algorithmic Workflow and Pseudocode

The D-CBG sampling loop is realized as follows (Schiff et al., 2024, Nisonoff et al., 2024):

Unconditional Prediction: For each position, compute denoiser logits $Q_{t|t-1}$ 2.
Classifier Pass: Compute $Q_{t|t-1}$ 3 and its gradient w.r.t. $Q_{t|t-1}$ 4.
Guided Logits Construction (per token):

$Q_{t|t-1}$ 5

where $Q_{t|t-1}$ 6.
Softmax Renormalization: Apply softmax across candidate indices to yield new categorical probabilities.
Sampling: Draw samples for $Q_{t|t-1}$ 7.
Iterate: Set $Q_{t|t-1}$ 8 and repeat.

In the continuous-time CTMC setting, one constructs the guided generator via reweighting as above and samples trajectories with standard CTMC samplers (Euler, Gillespie, or τ-leaping), with the Taylor expansion similarly amortizing predictor evaluations (Nisonoff et al., 2024).

5. Practical Considerations and Guidance Strength

The classifier for D-CBG is typically trained under the same noise schedule as the diffusion model, predicting the label from noised latents $Q_{t|t-1}$ 9 using cross-entropy loss (Schiff et al., 2024). The temperature parameter $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})$ 0 moderates the tradeoff between sample fidelity and conditional control—guidance of $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})$ 1 is optimal for small-vocabulary tasks, while higher $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})$ 2 (up to 10) is effective in molecular domains. Fewer diffusion steps (e.g., $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})$ 3) suffice in uniform noise schemes due to the possibility of token flips at any step. CTMC simulating ( $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})$ 4-leaping, Gillespie) can optimize accuracy in continuous-time D-CBG (Schiff et al., 2024, Nisonoff et al., 2024).

Resource requirements are dominated by backpropagation through the classifier for the gradient; a single GPU is adequate for typical biological sequence or molecular tasks with vocabulary sizes up to 32 and lengths in the range $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})$ 5 (Nisonoff et al., 2024). The Taylor-linear approximation enables all guided probabilities to be computed in $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})$ 6 classifier calls per time step.

6. Empirical Results Across Modalities

D-CBG achieves robust, high-performance controllable generation on a range of discrete domains:

Domain	Quality/Controllability Metrics	D-CBG Outcome
Genomic Sequences (Species10)	3-mer JS, Class F1	D-CBG ( $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t\|t-1} x_{t-1})$ 7): 3-mer JS ≈ 0.11, F1 improved from 0.81 to 0.94
Molecular Design (QM9)	Validity, Novelty, QED, Rings	QED maximization: 99.5% valid, 63.8% novel, mean QED 0.61 at $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t\|t-1} x_{t-1})$ 8
Discretized CIFAR-10	IS, FID, Class-conditional F1	IS 6.74→9.02, FID 33.8→15.6, F1 0.63→0.99 with $q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t\|t-1} x_{t-1})$ 9
Small Molecule SMILES	Rings purity, LogP targets	Targeted property shifts (90–93% purity), LogP control within MAE ≈ 0.7–0.9
Discrete Images (CTMC, CIFAR-10)	IS, FID	IS increases, FID decreases with stronger guidance ( $\alpha_t$ 0: IS 9.09, FID ≈ 9.04)
Cell-type DNA enhancer design	Fréchet Biological Distance, Target Prob.	Outperforms Dirichlet FM classifier-guidance across control strengths
Protein inverse-folding	Success rate ( $\alpha_t$ 1RMSD), Diversity	Success rate up to ∼40–93% (vs. 0–10%) with preserved diversity

These results demonstrate that D-CBG preserves sample diversity while sharply increasing attribute controllability; the approach consistently outperforms autoregressive+FUDGE baselines and previous classifier-free methods (Schiff et al., 2024, Nisonoff et al., 2024).

D-CBG adapts classical classifier-based guidance for continuous diffusion processes (Ma et al., 2023) to discrete data, overcoming the absence of gradients in discrete spaces via likelihood-ratio modulation and smooth Taylor approximations. D-CBG enables the use of both bespoke and off-the-shelf classifiers, e.g., pretrained ResNet or CLIP for discreteized images and text-guided generation, with further improvements obtained through calibration (Softplus activation, temperature scaling) and input pre-conditioning (Ma et al., 2023).

In contrast to classifier-free and universal guidance, D-CBG requires a trainable or pretrained classifier, but does not necessitate retraining of the base generative model. In all applications, D-CBG achieves conditional control with minimal computational overhead, enabling state-of-the-art results for discrete data (Schiff et al., 2024, Nisonoff et al., 2024, Ma et al., 2023).

A plausible implication is that D-CBG establishes a general recipe for conditional sampling in any probabilistic discrete generative flow or diffusion model, with temperature scheduling and efficient gradient-based amortization as key enablers for scaling to long sequences and large vocabularies.