Papers
Topics
Authors
Recent
2000 character limit reached

Novel Sampling for Diffusion Models

Updated 29 October 2025
  • The paper introduces Zigzag Diffusion Sampling which alternates strong and weak guidance to exploit the guidance gap, incrementally enhancing semantic alignment in generated samples.
  • The dilation path method computes closed-form scores in annealed Langevin dynamics, preserving mode coverage and reducing computational bottlenecks in high-dimensional settings.
  • The work also explores parallel and accelerated sampling via fixed-point iterations and Anderson acceleration, significantly speeding up the iterative process without sacrificing sample quality.

A novel sampling method for diffusion models is any algorithmic innovation that modifies the trajectory, update rule, iteration schedule, guidance mechanism, or architectural interface of the stochastic (SDE) or deterministic (ODE) sampling process to improve control, efficiency, quality, coverage, or robustness of samples. Recent research has produced multiple new sampling schemes exploiting forward/reverse process symmetries, geometric trajectories, explicit score computation, parallelization, adaptive region updates, and optimization-theoretic perspectives. These methods expand the practical applicability, controllability, and reliability of diffusion-based generative models.

1. Self-Reflection and Guidance Gap: Zigzag Diffusion Sampling

Diffusion models typically use high guidance scales to align outputs with conditional prompts, which can degrade fidelity or diversity. Zigzag Diffusion Sampling (Z-Sampling) (Bai et al., 14 Dec 2024) introduces a self-reflection mechanism that alternates strong-guidance denoising steps with weakly guided inversion steps. This leverages the guidance gap—the difference between the conditional guidance scales of forward (generation) and reverse (inversion) steps—as a semantic injection channel.

Formally, for denoising operation Φt\Phi^t with guidance γ1\gamma_1 and inversion operation Ψt\Psi^t with guidance γ2\gamma_2 at diffusion step tt:

xt1=Φt(Ψt(xt1c,γ2)c,γ1)x_{t-1} = \Phi^t( \Psi^t(x_{t-1} | c, \gamma_2) | c, \gamma_1 )

The key is that semantic information accumulated via the guidance gap is not canceled out, but incrementally builds up as the iterated denoising and inversion sequences "zigzag" the latent through the diffusion trajectory. The theoretical analysis identifies that the conditional-unconditional guidance gap δγ=γ1γ2\delta_\gamma = \gamma_1 - \gamma_2 linearly controls semantic gain, and the cumulative effect over TT steps is captured as

δZ-Sampling=t=1Tαtht2[δγ(uθ(xt,c,t)uθ(xt,,t))]2\delta_{\text{Z-Sampling}} = \sum_{t=1}^T \alpha_t h_t^2 [ \delta_\gamma (u_\theta(x_t, c, t) - u_\theta(x_t, \varnothing, t)) ]^2

Empirical results across models (Stable Diffusion, DiT, DreamShaper, AnimateDiff) and datasets show Z-Sampling outperforms baseline sampling and resampling, with HPS v2 winning rates exceeding 88–94% for challenging prompt-image alignment tasks. Z-Sampling is training-free, requires deterministic schedulers (such as DDIM), and can be combined with orthogonal guidance methods.

2. Dilation Path: Closed-Form Score Computation in Annealed Langevin Dynamics

Standard score-based diffusion sampling interpolates between a tractable initial (e.g., Gaussian) and the target distribution via a convolutional path, but requires intractable or MC-estimated score gradients for intermediates. The dilation path (Chehab et al., 20 Jun 2024) proposes a tractable alternative: interpolate by rescaling the target distribution itself,

μt(x)=1λtπ(xλt)\mu_t(x) = \frac{1}{\sqrt{\lambda_t}} \pi\left( \frac{x}{\sqrt{\lambda_t}} \right)

with explicit closed-form score

xlogμt(x)=1λtlogπ(xλt)\nabla_x \log \mu_t(x) = \frac{1}{\sqrt{\lambda_t}} \nabla \log \pi\left( \frac{x}{\sqrt{\lambda_t}} \right)

This structure ensures that means and covariances are interpolated via scaling, without changing mixture weights—thus, mode coverage is preserved. This allows the use of annealed Langevin dynamics,

xk+1=xk+hklogμk(xk)+2hkεkx_{k+1} = x_k + h_k \nabla \log \mu_k(x_k) + \sqrt{2 h_k} \varepsilon_k

with step-size adaptation hk(xk)1/logμk(xk)h_k(x_k) \propto 1/\|\nabla \log \mu_k(x_k)\| to ensure stability at small λt\lambda_t. Empirical evaluation demonstrates superior mode coverage and convergence for multimodal and high-dimensional distributions versus classical Langevin or geometric paths, and its simplicity removes a key computational bottleneck in classical annealed sampling.

3. Parallel and Accelerated Sampling: Fixed-Point and Iterative Refinement

The autoregressive nature of standard diffusion sampling creates an O(N)O(N) sequential bottleneck. Several orthogonal approaches eliminate this:

  • Picard Iteration and ParaDiGMS (Shih et al., 2023): Each ODE/SDE time step is treated as a coupled update in a global fixed-point iteration, with all steps updated in parallel per iteration:

xtk+1=x0k+1Ti=0t1s(xik,i/T)x^{k+1}_t = x^k_0 + \frac{1}{T} \sum_{i=0}^{t-1} s(x^k_i, i/T)

This parallelizes all TT steps, yielding a 2–4x wall-clock speedup without sacrificing sample quality or requiring retraining.

  • Triangular Anderson Acceleration (ParaTAA) (Tang et al., 15 Feb 2024): The entirety of the diffusion sampling computation is reframed as solving a system of triangular nonlinear equations via Anderson acceleration, using block-triangular update rules to preserve the dependency graph. ParaTAA reduces DDIM/Stable Diffusion’s 100-step sampling to as few as 7–10 parallel steps, with empirical FID and image metrics matching sequential output.
  • Region-Adaptive Sampling (RAS) (Liu et al., 14 Feb 2025): For transformer-based diffusion models, RAS only updates regions ("tokens") of an image to which the DiT model currently attends, caching the noise prediction for other regions. The region update is guided by token-level predicted noise statistics:

Rt=meanpatch(std(N^t))exp(kDpatch)R_t = \text{mean}_{\text{patch}}(\text{std}(\hat{N}_t)) \cdot \exp(k \cdot D_{\text{patch}})

RAS achieves up to 2.5x throughput increases in SD3 and Lumina-Next-T2I, with negligible degradation in FID and user-perceived sample quality.

4. Geometric and Optimization-Based Acceleration

A geometric framework considers diffusion sampling as a trajectory in high-dimensional space:

  • Geometric Perspective and ODE-Jump (Chen et al., 2023): The sampling trajectory for VE-SDEs is quasi-linear, with the denoising trajectory converging more rapidly to the data distribution. The mean-shift connection establishes the denoiser as a kernel-weighted data mean, and finite differences along the denoising trajectory underpin accelerated samplers such as DPM-Solver, DEIS, or the ODE-Jump method, which allows early stopping with reduced FID.
  • Momentum-augmented Solvers (Wizadwongsa et al., 2023): Integration of Polyak’s Heavy Ball (HB) or Generalized Heavy Ball (GHVB) momentum into Euler or Adams-Bashforth solvers dynamically expands the stability region, severely reducing divergence artifacts in aggressive low-step regimes, while GHVB interpolates tradeoffs between stability and accuracy.
  • Exponential Integrator Samplers (DEIS) (Zhang et al., 2022): The probability flow ODE of diffusion models possesses semilinear structure, enabling exponential integrators to integrate the stiff linear term exactly and approximate the nonlinear neural net term via Adams-Bashforth polynomial extrapolation over prior steps. DEIS achieves state-of-the-art sample quality in 10\leq 10 steps, outperforming other solvers at very low function evaluation counts.

5. Controllability and Constraint: Linear Input–Output Perturbation, Fairness, and Diversity

  • CCS (Controllable and Constrained Sampling) (Song et al., 7 Feb 2025) establishes and leverages a highly linear dependence of the final sample x0x_0 on the scale of perturbation to initial noise xTx_T. This enables precise control of sample statistics (e.g., mean) across batches, with diversity set by spherical interpolation of the initial latent. Empirical R2>0.98R^2 > 0.98 confirms the linearity, and CCS samples dominate on controllability, PSNR, and CLIP-IQA over prior approaches.
  • Attribute Switching for Fair Sampling (Choi et al., 6 Jan 2024): By switching conditional guidance for a sensitive attribute (e.g., gender) at an optimal time τ\tau during the reverse process, one can destroy undesirable attribute correlations, achieving distributional fairness without retraining or classifiers. The fairness-optimal τ\tau balances score differences via:

0τD(t)dt=τTD(t)dt, where D(t)=g2(t){xlogpt(Xˉts0)xlogpt(Xˉts1)}\int_0^\tau D(t)dt = \int_\tau^T D(t)dt, \text{ where } D(t) = g^2(t)\{\nabla_x \log p_t(\bar{X}_t|s_0) - \nabla_x \log p_t(\bar{X}_t|s_1)\}

PCA and balanced error rate analyses show effective obfuscation of sensitive information, with utility preserved.

  • Guided Sampling for Minority/Low-density Regions (Um et al., 2023, Sehwag et al., 2022): Custom metrics (Tweedie residuals, "minority score", "hardness score") enable classifier-guided diffusion toward low-likelihood, rare, or unique samples, substantially expanding coverage of underrepresented regions. Dual guidance terms ensure sample fidelity even when driving off the model’s dominant manifold.

6. Robustness and Hallucination Reduction

  • RODS (Robust Optimization-inspired Diffusion Sampler) (Tian et al., 16 Jul 2025): Recasting each sampling step as a local robust optimization, geometric cues (curvature indices) are computed at each step to detect risk of off-manifold updates. If local score curvature increases abruptly, RODS adaptively perturbs the step direction via a min-max search (sharpness-aware or curvature-aware). Benchmarks on AFHQv2, FFHQ, and 11k-hands demonstrate detection of over 70% of hallucinated outputs and correction of 25%+ without introducing new artifacts, with minimal computational cost increase.

7. Posterior and Inverse Problem Sampling

  • Generalized Posterior Sampling (DPS) (Chung et al., 2022, Stevens et al., 9 Sep 2024): For efficient Bayesian posterior sampling, especially in nonlinear/noisy inverse problems, sampling steps use Tweedie-style posterior mean approximations (bypassing hard projection). For sequential (e.g., real-time ultrasound) inverse problems (Stevens et al., 9 Sep 2024), reusing the previous frame’s posterior diffusion estimate or leveraging a ViViT transition model for initialization at lower noise scale enables a 25-fold reduction in sampling steps without sacrificing PSNR or quality.

In summary, novel sampling methods for diffusion models target key barriers in conditional alignment, speed, diversity, controllability, fairness, and robustness. These innovations exploit analytic properties (e.g., closed-form scores on dilation paths), geometric insight (quasi-linear and denoising trajectories), input–output linearity, optimization analogies (robust/proximal updates), parallelizable formulations (fixed-point, Anderson acceleration), and architectural advances (region-adaptive sampling via transformers). By integrating these advances, diffusion model samplers now provide improved prompt fidelity, reduced sampling latency, controlled diversity, fairer outputs, enhanced low-density coverage, and artifact-minimized trajectories—broadening practical deployment across domains from image synthesis to real-time medical imaging.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Novel Sampling Method for Diffusion Models.