Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conditional Diffusion and Guidance

Updated 16 May 2026
  • Conditional diffusion is a framework that steers reverse diffusion processes using conditional signals for controllable generative modeling.
  • Various guidance strategies—including classifier-based, classifier-free, and adaptive methods—balance fidelity and diversity in sampling.
  • Recent advances address issues like manifold drift, expectation bias, and computational efficiency to enhance applications in images, graphs, and molecules.

Conditional diffusion and guidance comprise a principled and versatile framework for controllable generative modeling, providing mechanisms to steer the evolution of diffusion processes towards designated conditional targets. Contemporary approaches include both model-internal (classifier-free and classifier-guided) and model-external (oracle, zero-order, control-based) strategies, enabling a broad array of conditional tasks in high-dimensional domains including images, text, molecules, and graphs. This article systematically reviews the mathematical foundations, algorithmic strategies, theoretical analyses, and recent innovations in conditional diffusion and guidance.

1. Mathematical Foundations of Conditional Diffusion

The denoising diffusion probabilistic model (DDPM) defines a forward Markov chain q(x0,...,xT)q(x_0, ..., x_T) progressively adding noise to a data sample x0∼pdata(x)x_0 \sim p_{\text{data}}(x), with transitions

q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)

for a noise schedule {βt}\{\beta_t\}. The reverse generative process parameterizes the distribution

pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),Σt)p_\theta(x_{t-1}|x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t,t), \Sigma_t)

and is trained to match the true reversal via maximum likelihood or score matching, often reformulated in terms of noiseprediction ϵθ(xt,t)\epsilon_\theta(x_t,t). For conditional generation (e.g., class label cc or text yy), the target is p(x0∣c)p(x_0|c), typically implemented via conditional network inputs or modification of the reverse process (Ho et al., 2022, Fu et al., 2024).

Conditional guidance augments the score function used in sampling to bias the trajectory toward the conditional target. Classic techniques include classifier guidance [Dhariwal & Nichol, 2021], which injects an external classifier gradient, and classifier-free guidance (CFG) [Ho & Salimans, 2022], which performs an affine combination of conditional and unconditional scores within a single trained model: s^(xt;c,w)=(1−w)sθ(xt,t)+w sθ(xt,t∣c)\hat{s}(x_t;c, w) = (1-w) s_{\theta}(x_t,t) + w\, s_{\theta}(x_t,t|c) with x0∼pdata(x)x_0 \sim p_{\text{data}}(x)0 amplifying the conditional influence (Sadat et al., 2024, Jin et al., 26 Sep 2025, Fu et al., 2024).

2. Classifier and Classifier-Free Guidance

Classifier guidance employs a separately trained classifier x0∼pdata(x)x_0 \sim p_{\text{data}}(x)1 robust to the diffusion noise schedule. At each step, the mean of the reverse kernel is modified as

x0∼pdata(x)x_0 \sim p_{\text{data}}(x)2

where x0∼pdata(x)x_0 \sim p_{\text{data}}(x)3 is a user-defined scale. The stability and informativeness of x0∼pdata(x)x_0 \sim p_{\text{data}}(x)4 are critical: classifiers not exposed to noise produce gradients with nearly random orientation, leading to poor conditional fidelity (FID > 100, accuracy x0∼pdata(x)x_0 \sim p_{\text{data}}(x)5 15%) (Vaeth et al., 2024). Training the classifier on x0∼pdata(x)x_0 \sim p_{\text{data}}(x)6 pairs from the DDPM noise schedule restores meaningful gradients with stable alignment and sharp improvements in FID and accuracy.

Classifier-free guidance (CFG) synthesizes unconditional and conditional score estimates: x0∼pdata(x)x_0 \sim p_{\text{data}}(x)7 This requires "dropping" the conditioning during training to obtain both branches in a single pass (Ho et al., 2022, Fu et al., 2024). CFG trades sample diversity for conditional fidelity as x0∼pdata(x)x_0 \sim p_{\text{data}}(x)8 increases, with empirical sweet spots for x0∼pdata(x)x_0 \sim p_{\text{data}}(x)9 often in q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)0 (Gao et al., 31 Jan 2025, Jin et al., 26 Sep 2025).

Advances include:

  • Independent Condition Guidance (ICG): Replaces the null input of CFG with a random independent condition at inference, providing an unconditional proxy applicable to any conditional checkpoint absent explicit null training (Sadat et al., 2024).
  • Time-Step Guidance (TSG): Perturbs the time-step embedding, producing guidance directions by comparing model outputs at nearby time indices. TSG achieves quality improvements even in unconditional generative models.

3. Gradient, Manifold, and Theoretical Analyses

Recent works have rigorously investigated the theoretical underpinnings and limitations of standard guidance techniques:

  • Manifold drift: Standard (Euclidean) extrapolation in high guidance regimes drives the sample off the data manifold, producing artifacts or mode collapse. This is addressed by geometry-aware corrections. For example, Manifold-Optimal Guidance (MOG) introduces a local Riemannian metric, yielding a closed-form update that penalizes off-manifold drift (Jia et al., 12 Mar 2026). Adaptive scaling (Auto-MOG) tunes step size automatically to balance guidance energy.
  • Score correction: Classical CFG does not correspond to the score of any valid diffusion process for the "tilted" conditional distribution q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)1. The correct score contains an additional Rényi-divergence repulsion term vanishing only as the noise approaches zero. Omitting this term leads to distributional bias and diminished diversity (Moufad et al., 27 May 2025).
  • Rectification: Rectified Gradient Guidance (REG) (Gao et al., 31 Jan 2025) and Rectified CFG (ReCFG) (Xia et al., 2024) introduce lightweight correction terms (e.g., chain rule approximations or non-uniform weighting) to restore theoretical consistency and remove the "expectation shift" endemic to standard CFG update rules. ReCFG provides per-condition, per-timestep coefficient adjustment via a precomputed lookup table.
  • Sample complexity: The statistical theory developed in (Fu et al., 2024) shows that classifier-free conditional diffusion achieves minimax-optimal rates (in total variation distance) for distribution estimation under appropriate conditions, with guidance strength q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)2 trading off statistical bias vs. coverage.

4. Alternative and Adaptive Guidance Strategies

Several methodologies have been proposed to ameliorate the limitations of static and dense guidance, improve computational efficiency, or enable new types of conditional control:

  • Sparse and Compressed Guidance: Omitting guidance at a large fraction of timesteps, or adaptively reusing previously computed gradients, substantially mitigates model-fitting pathologies (overfitting samples to the guiding classifier) and improves both quality and diversity. Compress Guidance (CompG) retains performance with up to q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)3 fewer gradient evaluations (Dinh et al., 2024). Adaptive Guidance (Ag) omits CFG updates once the conditional and unconditional scores are sufficiently aligned, reducing computation by q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)4 without perceptible loss (Castillo et al., 2023).
  • Feedback/State-Dependent Guidance: Feedback Guidance (FBG) uses closed-loop adjustment of guidance strength, modulated by an online estimate of the conditional posterior q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)5. Guidance is applied only as needed, matching human intuition and outperforming fixed or interval-limited schedules on standard metrics (Koulischer et al., 6 Jun 2025).
  • Self-Guidance (SG): SG leverages the model's own density estimation: at each step, the velocity is steered using the difference between scores at the current and a slightly noisier time. SG is fully plug-and-play and requires no additional training or conditioning networks. Its approximation (SG-prev) achieves similar fidelity at half the network cost (Li et al., 2024).
  • Constraint-Based Guidance: When enforcing hard (probability one) constraints, Doob's q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)6-transform is employed to modify the reverse SDE by an explicit drift correction based on the logarithmic gradient of the event probability q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)7, with non-asymptotic TV and Wasserstein error bounds (Guo et al., 5 Feb 2026).
  • Spherical Gaussian Constraint (DSG): DSG constrains the guidance step to a high-density shell determined by posterior concentration, preventing manifold deviation and enabling larger, more stable guidance steps (Yang et al., 2024).

5. Extensions: Graphs, Molecules, Reward and Oracle Guidance

The conditional diffusion and guidance paradigm generalizes beyond classical vision domains:

  • Graph Conditional Generation: In the presence of combinatorial, discrete, or non-differentiable rewards, Graph Guided Diffusion (GGDiff) interprets conditional generation as a stochastic control problem. It unifies gradient-based, forward-evaluation, and zero-order (black-box) strategies, enabling plug-in guidance under arbitrary rewards and constraints—with applications in motif, fairness, and link prediction tasks (Tenorio et al., 26 May 2025). Loop guidance in hierarchical diffusion architectures (Twigs) allows structured exchange between a "trunk" and multiple "stem" flows, with state-of-the-art results in conditional and multi-property tasks (Mercatali et al., 2024).
  • Oracle/Physics-Based Guidance: For problems such as molecular optimization, guidance can be provided by a non-differentiable oracle (e.g., quantum chemistry code). Gradients are estimated by stochastic perturbation (SPSA/zero-order), enabling exact property optimization in molecular design and stability, fully compatible with explicit and implicit neural guidance (Shen et al., 2024).
  • Representation/Feature Guidance: Optimized Inference with Guidance (OIG) adds CLIP-based semantic and structural losses to the reverse process for text-driven image editing, enforcing semantic fidelity while preserving background and geometry (Lee et al., 2024).

6. Stage-Wise and Time-Varying Guidance Policies

The dynamical impact of guidance during diffusion sampling is now understood to be multi-phased:

  • Stage-wise theory (Jin et al., 26 Sep 2025) divides CFG-guided sampling into

    1. Direction Shift: Early over-concentration on dominant modes (initialization bias).
    2. Mode Separation: Weaker conditional modes are indirectly suppressed.
    3. Concentration: Late-stage over-contraction reduces fine-grained diversity.
  • Based on this framework, time-varying guidance schedules (e.g., symmetric triangular profiles) are optimal, ramping up guidance in the middle while reducing it at the ends to balance semantic fidelity and diversity. Empirical evidence substantiates substantial gains over static high-guidance schedules.

7. Practical Recommendations and Implementation

Key practices established across studies include:

  • Always train classifiers or property networks on the same noise schedule as seen during diffusion sampling for stable and meaningful gradients (Vaeth et al., 2024).
  • When guidance must be applied externally or in a sparse/compressed fashion, prefer early application and schedule placement according to gradient magnitude and sample roughness (Dinh et al., 2024, Castillo et al., 2023).
  • Dynamically tune guidance scale q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)8 to trade off FID (fidelity) and conditional accuracy, with sweet spots typically observed around q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)9–{βt}\{\beta_t\}0 for CFG variants.
  • Avoid excessive stacking of stabilization or rectification strategies, especially with robust base guidance; overregularization can suppress class- or property-specific variation (Vaeth et al., 2024, Gao et al., 31 Jan 2025).
  • For hard constraints or rare-event simulation, estimate the drift correction via martingale/statistical learning using samples from the pretrained model, yielding rigorous guarantees (Guo et al., 5 Feb 2026).

8. Summary Table: Major Guidance Variants

Method Core Approach Notable Features / Limitations
Classifier Guidance External classifier gradient Requires robust classifier; nonlinear gradients at high noise
Classifier-Free Guidance Linear blend {βt}\{\beta_t\}1 unconditional/cond. score Bias–variance trade-off tunable by {βt}\{\beta_t\}2; expectation shift
Rectified Guidance Chain-rule or per-step coefficient correction Corrects expectation bias; lookup table or Jacobian computation
Feedback Guidance State-adaptive guidance scale Closed-loop; empirically optimal for prompt-specific complexity
Sparse/Compress Guidance Temporal subsampling or reuse of gradients Reduces computation, mitigates overfitting; needs schedule tuning
Self-Guidance Score difference across noise levels Training-free, architecture-agnostic; increased per-step cost
Manifold-Optimal Riemannian metric-based extrapolation Eliminates off-manifold drift; adaptive scaling possible
Graph/Oracle Guidance Zero-order, control-based, or oracle-driven Suited for non-differentiable rewards, discrete structures
Hard Constraint (Doob) Drift from conditional probability function {βt}\{\beta_t\}3 Satisfies hard constraints with martingale/statistical estimation

Conditional diffusion guidance now encompasses a spectrum of mathematically grounded, empirically validated, and extensible techniques. Together, these enable domain-general, highly controllable, and efficient conditional generation under both soft and hard constraints, covering modern applications from large-scale image synthesis to property-driven graph and molecule generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Diffusion and Guidance.