Papers
Topics
Authors
Recent
Search
2000 character limit reached

Discrete Classifier-Based Guidance (D-CBG)

Updated 1 April 2026
  • D-CBG is a family of algorithms for conditional sampling in discrete diffusion and flow models that combines an unconditional denoiser with classifier guidance.
  • It employs Bayes-inspired reweighting of discrete Markov chain transitions using likelihood ratios from a classifier and a temperature hyperparameter to modulate control strength.
  • D-CBG enables efficient, state-of-the-art conditional generation across domains such as nucleotide sequences, molecule design, and image discretization without retraining the base model.

Discrete Classifier-Based Guidance (D-CBG) is a family of algorithms for conditional sampling in discrete diffusion and flow models, enabling controllable generative modeling of categorical-valued data such as nucleotide sequences, molecules, and discretized images. D-CBG generalizes the continuous-state classifier guidance principle to discrete state spaces, combining an unconditional discrete denoiser or flow with a (possibly time-dependent) classifier to direct generation toward desired attributes or labels. The core mechanism adapts Bayes-inspired guidance to discrete Markov chains, employing reweighting of transition probabilities or rates by the likelihood ratio of the target property under a classifier, with a temperature hyperparameter modulating control strength. D-CBG is computationally efficient, requiring no retraining of the base generator, and achieves state-of-the-art conditional control across diverse discrete domains (Schiff et al., 2024, Nisonoff et al., 2024, Ma et al., 2023).

1. Discrete Diffusion and Flow: Model Setup

In D-CBG, data consists of sequences xVLx \in V^L over a vocabulary of size NN, with each token represented in a one-hot encoding. Discrete diffusion models are constructed as forward Markov chains, where each time-step transition is defined by a categorical distribution parameterized by a transition matrix Qtt1Q_{t|t-1}: q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1}) A common “interpolating” schedule linearly mixes the input state with the uniform distribution, controlled by a time-varying parameter αt\alpha_t.

The goal of generative modeling is to learn a reverse process approximating the true posterior q(xt1xt)q(x_{t-1} \mid x_t), which, due to intractable dependence on the original uncorrupted data, is parameterized by a neural denoiser pθ(xt1xt)p_\theta(x_{t-1} \mid x_t) trained via a variational objective over the trajectory. For continuous-time settings, models define a time-dependent rate/generator matrix RtR_t for the continuous-time Markov chain (CTMC) on the discrete state space (Nisonoff et al., 2024).

2. Principle and Mathematical Formulation of D-CBG

D-CBG augments unconditional sampling to generate samples from a conditional distribution (e.g., with specified class yy or property cc). The guidance is implemented by combining the base model with a separately trained classifier. Let NN0 be the classifier output for attribute NN1: NN2 Here, the guidance strength NN3 amplifies conditioning. In continuous-time settings, transition rates in the CTMC are reweighted for each possible state transition NN4 as: NN5 This guided process preserves ergodicity and ensures that, in the limit, samples are biased according to the specified attribute (Nisonoff et al., 2024, Schiff et al., 2024).

3. Efficient Implementation: Tokenwise and Taylor Approximations

Naïvely, evaluating the classifier for all candidate state replacements scales as NN6 for sequences (or NN7 for general discrete objects), quickly becoming prohibitive for long sequences or extensive vocabularies. D-CBG exploits a first-order Taylor expansion, treating the classifier log-probability as a smooth function of the input: NN8 Since NN9 and Qtt1Q_{t|t-1}0 differ in only one token, the entire batch of guided probabilities for all candidate mutations can be computed with a single forward and backward pass through the classifier. This enables practical application to high-dimensional biosequence and molecule spaces (Schiff et al., 2024, Nisonoff et al., 2024).

A standard pseudocode loop for the discrete-time case consists of, for each step and each token: computing denoiser logits, one classifier forward-backward for the current Qtt1Q_{t|t-1}1, constructing the guided logits, renormalizing with softmax, and sampling the next token (see Section 4 below).

4. Algorithmic Workflow and Pseudocode

The D-CBG sampling loop is realized as follows (Schiff et al., 2024, Nisonoff et al., 2024):

  1. Unconditional Prediction: For each position, compute denoiser logits Qtt1Q_{t|t-1}2.
  2. Classifier Pass: Compute Qtt1Q_{t|t-1}3 and its gradient w.r.t. Qtt1Q_{t|t-1}4.
  3. Guided Logits Construction (per token):

    Qtt1Q_{t|t-1}5

    where Qtt1Q_{t|t-1}6.

  4. Softmax Renormalization: Apply softmax across candidate indices to yield new categorical probabilities.
  5. Sampling: Draw samples for Qtt1Q_{t|t-1}7.
  6. Iterate: Set Qtt1Q_{t|t-1}8 and repeat.

In the continuous-time CTMC setting, one constructs the guided generator via reweighting as above and samples trajectories with standard CTMC samplers (Euler, Gillespie, or τ-leaping), with the Taylor expansion similarly amortizing predictor evaluations (Nisonoff et al., 2024).

5. Practical Considerations and Guidance Strength

The classifier for D-CBG is typically trained under the same noise schedule as the diffusion model, predicting the label from noised latents Qtt1Q_{t|t-1}9 using cross-entropy loss (Schiff et al., 2024). The temperature parameter q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})0 moderates the tradeoff between sample fidelity and conditional control—guidance of q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})1 is optimal for small-vocabulary tasks, while higher q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})2 (up to 10) is effective in molecular domains. Fewer diffusion steps (e.g., q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})3) suffice in uniform noise schemes due to the possibility of token flips at any step. CTMC simulating (q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})4-leaping, Gillespie) can optimize accuracy in continuous-time D-CBG (Schiff et al., 2024, Nisonoff et al., 2024).

Resource requirements are dominated by backpropagation through the classifier for the gradient; a single GPU is adequate for typical biological sequence or molecular tasks with vocabulary sizes up to 32 and lengths in the range q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})5 (Nisonoff et al., 2024). The Taylor-linear approximation enables all guided probabilities to be computed in q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})6 classifier calls per time step.

6. Empirical Results Across Modalities

D-CBG achieves robust, high-performance controllable generation on a range of discrete domains:

Domain Quality/Controllability Metrics D-CBG Outcome
Genomic Sequences (Species10) 3-mer JS, Class F1 D-CBG (q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})7): 3-mer JS ≈ 0.11, F1 improved from 0.81 to 0.94
Molecular Design (QM9) Validity, Novelty, QED, Rings QED maximization: 99.5% valid, 63.8% novel, mean QED 0.61 at q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})8
Discretized CIFAR-10 IS, FID, Class-conditional F1 IS 6.74→9.02, FID 33.8→15.6, F1 0.63→0.99 with q(xtxt1)=Cat(xt;Qtt1xt1)q(x_t \mid x_{t-1}) = \mathrm{Cat}(x_t; Q_{t|t-1} x_{t-1})9
Small Molecule SMILES Rings purity, LogP targets Targeted property shifts (90–93% purity), LogP control within MAE ≈ 0.7–0.9
Discrete Images (CTMC, CIFAR-10) IS, FID IS increases, FID decreases with stronger guidance (αt\alpha_t0: IS 9.09, FID ≈ 9.04)
Cell-type DNA enhancer design Fréchet Biological Distance, Target Prob. Outperforms Dirichlet FM classifier-guidance across control strengths
Protein inverse-folding Success rate (αt\alpha_t1RMSD), Diversity Success rate up to ∼40–93% (vs. 0–10%) with preserved diversity

These results demonstrate that D-CBG preserves sample diversity while sharply increasing attribute controllability; the approach consistently outperforms autoregressive+FUDGE baselines and previous classifier-free methods (Schiff et al., 2024, Nisonoff et al., 2024).

D-CBG adapts classical classifier-based guidance for continuous diffusion processes (Ma et al., 2023) to discrete data, overcoming the absence of gradients in discrete spaces via likelihood-ratio modulation and smooth Taylor approximations. D-CBG enables the use of both bespoke and off-the-shelf classifiers, e.g., pretrained ResNet or CLIP for discreteized images and text-guided generation, with further improvements obtained through calibration (Softplus activation, temperature scaling) and input pre-conditioning (Ma et al., 2023).

In contrast to classifier-free and universal guidance, D-CBG requires a trainable or pretrained classifier, but does not necessitate retraining of the base generative model. In all applications, D-CBG achieves conditional control with minimal computational overhead, enabling state-of-the-art results for discrete data (Schiff et al., 2024, Nisonoff et al., 2024, Ma et al., 2023).

A plausible implication is that D-CBG establishes a general recipe for conditional sampling in any probabilistic discrete generative flow or diffusion model, with temperature scheduling and efficient gradient-based amortization as key enablers for scaling to long sequences and large vocabularies.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Discrete Classifier-Based Guidance (D-CBG).