Discrete Classifier-Based Guidance (D-CBG)
- D-CBG is a family of algorithms for conditional sampling in discrete diffusion and flow models that combines an unconditional denoiser with classifier guidance.
- It employs Bayes-inspired reweighting of discrete Markov chain transitions using likelihood ratios from a classifier and a temperature hyperparameter to modulate control strength.
- D-CBG enables efficient, state-of-the-art conditional generation across domains such as nucleotide sequences, molecule design, and image discretization without retraining the base model.
Discrete Classifier-Based Guidance (D-CBG) is a family of algorithms for conditional sampling in discrete diffusion and flow models, enabling controllable generative modeling of categorical-valued data such as nucleotide sequences, molecules, and discretized images. D-CBG generalizes the continuous-state classifier guidance principle to discrete state spaces, combining an unconditional discrete denoiser or flow with a (possibly time-dependent) classifier to direct generation toward desired attributes or labels. The core mechanism adapts Bayes-inspired guidance to discrete Markov chains, employing reweighting of transition probabilities or rates by the likelihood ratio of the target property under a classifier, with a temperature hyperparameter modulating control strength. D-CBG is computationally efficient, requiring no retraining of the base generator, and achieves state-of-the-art conditional control across diverse discrete domains (Schiff et al., 2024, Nisonoff et al., 2024, Ma et al., 2023).
1. Discrete Diffusion and Flow: Model Setup
In D-CBG, data consists of sequences over a vocabulary of size , with each token represented in a one-hot encoding. Discrete diffusion models are constructed as forward Markov chains, where each time-step transition is defined by a categorical distribution parameterized by a transition matrix : A common “interpolating” schedule linearly mixes the input state with the uniform distribution, controlled by a time-varying parameter .
The goal of generative modeling is to learn a reverse process approximating the true posterior , which, due to intractable dependence on the original uncorrupted data, is parameterized by a neural denoiser trained via a variational objective over the trajectory. For continuous-time settings, models define a time-dependent rate/generator matrix for the continuous-time Markov chain (CTMC) on the discrete state space (Nisonoff et al., 2024).
2. Principle and Mathematical Formulation of D-CBG
D-CBG augments unconditional sampling to generate samples from a conditional distribution (e.g., with specified class or property ). The guidance is implemented by combining the base model with a separately trained classifier. Let 0 be the classifier output for attribute 1: 2 Here, the guidance strength 3 amplifies conditioning. In continuous-time settings, transition rates in the CTMC are reweighted for each possible state transition 4 as: 5 This guided process preserves ergodicity and ensures that, in the limit, samples are biased according to the specified attribute (Nisonoff et al., 2024, Schiff et al., 2024).
3. Efficient Implementation: Tokenwise and Taylor Approximations
Naïvely, evaluating the classifier for all candidate state replacements scales as 6 for sequences (or 7 for general discrete objects), quickly becoming prohibitive for long sequences or extensive vocabularies. D-CBG exploits a first-order Taylor expansion, treating the classifier log-probability as a smooth function of the input: 8 Since 9 and 0 differ in only one token, the entire batch of guided probabilities for all candidate mutations can be computed with a single forward and backward pass through the classifier. This enables practical application to high-dimensional biosequence and molecule spaces (Schiff et al., 2024, Nisonoff et al., 2024).
A standard pseudocode loop for the discrete-time case consists of, for each step and each token: computing denoiser logits, one classifier forward-backward for the current 1, constructing the guided logits, renormalizing with softmax, and sampling the next token (see Section 4 below).
4. Algorithmic Workflow and Pseudocode
The D-CBG sampling loop is realized as follows (Schiff et al., 2024, Nisonoff et al., 2024):
- Unconditional Prediction: For each position, compute denoiser logits 2.
- Classifier Pass: Compute 3 and its gradient w.r.t. 4.
- Guided Logits Construction (per token):
5
where 6.
- Softmax Renormalization: Apply softmax across candidate indices to yield new categorical probabilities.
- Sampling: Draw samples for 7.
- Iterate: Set 8 and repeat.
In the continuous-time CTMC setting, one constructs the guided generator via reweighting as above and samples trajectories with standard CTMC samplers (Euler, Gillespie, or τ-leaping), with the Taylor expansion similarly amortizing predictor evaluations (Nisonoff et al., 2024).
5. Practical Considerations and Guidance Strength
The classifier for D-CBG is typically trained under the same noise schedule as the diffusion model, predicting the label from noised latents 9 using cross-entropy loss (Schiff et al., 2024). The temperature parameter 0 moderates the tradeoff between sample fidelity and conditional control—guidance of 1 is optimal for small-vocabulary tasks, while higher 2 (up to 10) is effective in molecular domains. Fewer diffusion steps (e.g., 3) suffice in uniform noise schemes due to the possibility of token flips at any step. CTMC simulating (4-leaping, Gillespie) can optimize accuracy in continuous-time D-CBG (Schiff et al., 2024, Nisonoff et al., 2024).
Resource requirements are dominated by backpropagation through the classifier for the gradient; a single GPU is adequate for typical biological sequence or molecular tasks with vocabulary sizes up to 32 and lengths in the range 5 (Nisonoff et al., 2024). The Taylor-linear approximation enables all guided probabilities to be computed in 6 classifier calls per time step.
6. Empirical Results Across Modalities
D-CBG achieves robust, high-performance controllable generation on a range of discrete domains:
| Domain | Quality/Controllability Metrics | D-CBG Outcome |
|---|---|---|
| Genomic Sequences (Species10) | 3-mer JS, Class F1 | D-CBG (7): 3-mer JS ≈ 0.11, F1 improved from 0.81 to 0.94 |
| Molecular Design (QM9) | Validity, Novelty, QED, Rings | QED maximization: 99.5% valid, 63.8% novel, mean QED 0.61 at 8 |
| Discretized CIFAR-10 | IS, FID, Class-conditional F1 | IS 6.74→9.02, FID 33.8→15.6, F1 0.63→0.99 with 9 |
| Small Molecule SMILES | Rings purity, LogP targets | Targeted property shifts (90–93% purity), LogP control within MAE ≈ 0.7–0.9 |
| Discrete Images (CTMC, CIFAR-10) | IS, FID | IS increases, FID decreases with stronger guidance (0: IS 9.09, FID ≈ 9.04) |
| Cell-type DNA enhancer design | Fréchet Biological Distance, Target Prob. | Outperforms Dirichlet FM classifier-guidance across control strengths |
| Protein inverse-folding | Success rate (1RMSD), Diversity | Success rate up to ∼40–93% (vs. 0–10%) with preserved diversity |
These results demonstrate that D-CBG preserves sample diversity while sharply increasing attribute controllability; the approach consistently outperforms autoregressive+FUDGE baselines and previous classifier-free methods (Schiff et al., 2024, Nisonoff et al., 2024).
7. Extensions, Related Work, and Limitations
D-CBG adapts classical classifier-based guidance for continuous diffusion processes (Ma et al., 2023) to discrete data, overcoming the absence of gradients in discrete spaces via likelihood-ratio modulation and smooth Taylor approximations. D-CBG enables the use of both bespoke and off-the-shelf classifiers, e.g., pretrained ResNet or CLIP for discreteized images and text-guided generation, with further improvements obtained through calibration (Softplus activation, temperature scaling) and input pre-conditioning (Ma et al., 2023).
In contrast to classifier-free and universal guidance, D-CBG requires a trainable or pretrained classifier, but does not necessitate retraining of the base generative model. In all applications, D-CBG achieves conditional control with minimal computational overhead, enabling state-of-the-art results for discrete data (Schiff et al., 2024, Nisonoff et al., 2024, Ma et al., 2023).
A plausible implication is that D-CBG establishes a general recipe for conditional sampling in any probabilistic discrete generative flow or diffusion model, with temperature scheduling and efficient gradient-based amortization as key enablers for scaling to long sequences and large vocabularies.