Papers
Topics
Authors
Recent
Search
2000 character limit reached

Consistent Annealed Sampling (CAS)

Updated 16 May 2026
  • Consistent Annealed Sampling is a method for iterative sampling that rigorously enforces an annealing schedule to mitigate noise drift and mode collapse in high-dimensional distributions.
  • It systematically combines score-driven updates with scheduled noise injection, blending unconditional and conditional generative approaches to preserve variance and improve convergence.
  • CAS provides provable guarantees and enhanced empirical performance in diffusion models, posterior sampling, and particle methods, leading to superior mode coverage and sample fidelity.

Consistent Annealed Sampling (CAS) is a rigorous framework for iterative sampling from high-dimensional distributions, particularly in the context of score-based generative models, posterior inference with diffusion models, and kernel particle methods. The central objective of CAS is to enforce consistency with a prescribed “annealing” (noise or temperature) schedule during discretized sampling, thereby addressing issues that arise from drift in marginal noise (in Langevin/diffusion methods) or poor mode coverage (in particle optimization methods) when using imperfect or limited samplers. CAS achieves provable guarantees and improved empirical performance by systematically combining score-driven updates with carefully tuned injection of noise, blending unconditional and conditional generative frameworks, and, in some cases, kernelized repulsion.

1. Formal Definition and Mathematical Foundations

CAS aims to draw samples xx from either a target distribution p(x)p(x) or a conditional posterior p(xy)p(x\mid y), typically when only a learned approximation of the score xlogp(x)\nabla_x \log p(x) is available and computational constraints limit the number of iterative steps or the accuracy of the approximation.

In score-based generative models, CAS is used to sample from smoothed versions pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x), descending a schedule of noise levels {σi}\{\sigma_i\} geometrically from a large σ1\sigma_1 (covering the data support) to a small σN\sigma_N (close to the data manifold). The CAS update at each step is:

xi=xi1+ησi2sθ(xi1,σi)+βσi+1zix_i = x_{i-1} + \eta \, \sigma_i^2 s_\theta(x_{i-1}, \sigma_i) + \beta\, \sigma_{i+1} z_i

where ziN(0,I)z_i\sim\mathcal N(0,I), p(x)p(x)0 approximates the score, and p(x)p(x)1 are set so that p(x)p(x)2 is preserved. This ensures that with finite steps and possibly imperfect p(x)p(x)3, the sample sequence adheres exactly to the intended marginal schedule, a property not maintained by standard Annealed Langevin Sampling (ALS) when p(x)p(x)4 is moderate or p(x)p(x)5 inexact (Serrà et al., 2021).

In posterior sampling, CAS combines unconditional diffusion samplers (to initialize near p(x)p(x)6) and an annealed chain of Langevin refinements targeting posteriors p(x)p(x)7 for a descending noise schedule p(x)p(x)8. The algorithm avoids error amplification by ensuring the chain never deviates far from regions of reliable score estimation, and carefully controls the accumulation of estimator and discretization errors (Xun et al., 30 Oct 2025).

For particle approximations, CAS is realized as annealed Stein Variational Gradient Descent (annealed SVGD), with a temperature schedule p(x)p(x)9, and incorporates temperature-weighted gradients for incremental exploration-to-exploitation transitions (d'Angelo et al., 2021).

2. Noise and Temperature Scheduling

CAS strictly enforces a geometric noise/temperature schedule:

p(xy)p(x\mid y)0

and parameterizes the update weights as:

p(xy)p(x\mid y)1

with p(xy)p(x\mid y)2. This reparameterization guarantees that, irrespective of the finite budget p(xy)p(x\mid y)3, the updates remain within stability/variance-preservation bounds: p(xy)p(x\mid y)4 and p(xy)p(x\mid y)5 (Serrà et al., 2021). For temperature-based schemes, inverse-temperature schedules p(xy)p(x\mid y)6 (linear, tanh, or cyclical) are used to generate tempered targets p(xy)p(x\mid y)7, allowing annealed SVGD to interpolate between exploratory (low p(xy)p(x\mid y)8) and exploitative (high p(xy)p(x\mid y)9) regimes (d'Angelo et al., 2021).

3. Algorithmic Structure and Theoretical Guarantees

CAS algorithms are characterized by alternating steps of drift (using the score approximation) and controlled stochasticity (noise injection), with explicit correction for the discretization-induced deviation from the intended schedule. In diffusion/posterior applications (Xun et al., 30 Oct 2025), the procedure is:

  • Draw initial xlogp(x)\nabla_x \log p(x)0 using unconditional diffusion.
  • For xlogp(x)\nabla_x \log p(x)1 to xlogp(x)\nabla_x \log p(x)2:
    • Define xlogp(x)\nabla_x \log p(x)3.
    • Use updated score xlogp(x)\nabla_x \log p(x)4.
    • Apply a short burst of Langevin steps of duration xlogp(x)\nabla_x \log p(x)5 and step size xlogp(x)\nabla_x \log p(x)6.
  • Return xlogp(x)\nabla_x \log p(x)7 as an approximate sample from xlogp(x)\nabla_x \log p(x)8.

Under assumptions of xlogp(x)\nabla_x \log p(x)9-strong log-concavity, pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)0-Lipschitzness, and an pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)1 error bound on the score approximation, CAS guarantees polynomial mixing time and samples with bounded total variation distance to the true posterior. The required error bound is pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)2 in the score estimator (as opposed to sub-exponential error required for vanilla Langevin), and the total oracle complexity is polynomial in pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)3, where pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)4 (Xun et al., 30 Oct 2025).

For denoising score matching (Serrà et al., 2021), final-step "Expected Denoised Sample" (EDS) correction is used:

pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)5

For particle samplers, the annealed SVGD update at step pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)6 is:

pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)7

guaranteeing weak convergence to the target density pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)8 as pσ(x)=(pN(0,σ2I))(x)p_\sigma(x) = (p*\mathcal N(0,\sigma^2 I))(x)9 and under appropriate step-size decay (d'Angelo et al., 2021).

CAS generalizes and interpolates between several existing samplers:

Scheme Special CAS Parameterization Key Difference
Annealed Langevin Sampling {σi}\{\sigma_i\}0 ({σi}\{\sigma_i\}1) Noise amplitude scaled differently
Predictor–Corrector SDE {σi}\{\sigma_i\}2 PC predictor up to factor {σi}\{\sigma_i\}3
Deterministic Denoising {σi}\{\sigma_i\}4 ({σi}\{\sigma_i\}5) Pure denoising, no noise injection
Full Noise Injection {σi}\{\sigma_i\}6 ({σi}\{\sigma_i\}7) Pure “denoise then re-noise”

In the limit {σi}\{\sigma_i\}8 ({σi}\{\sigma_i\}9), CAS converges to ALS or PC predictor schemes (Serrà et al., 2021).

For annealed SVGD, CAS is compared to:

5. Implementation and Practical Tuning

Practical application of CAS requires setting endpoints for the noise schedule σ1\sigma_10 or temperature schedule σ1\sigma_11, total step count σ1\sigma_12, and the schedule parameter σ1\sigma_13. Empirically, σ1\sigma_14 is effective across a wide σ1\sigma_15 range (σ1\sigma_16). Tuning can be performed by sweeping σ1\sigma_17 logarithmically and selecting values via perceptual or likelihood metrics (Serrà et al., 2021).

For posterior sampling in inverse problems (Xun et al., 30 Oct 2025), CAS alternates diffusion-based warm starts with short, annealed conditional Langevin refinements, and step sizes are chosen so that overall discretization error remains controlled (σ1\sigma_18). Warm starts can use any unconditional sampler providing σ1\sigma_19-accurate scores (e.g., DDIM).

6. Empirical Results and Applications

In high-dimensional inverse problems such as image inpainting, super-resolution, and deblurring, CAS was empirically validated on the FFHQ-256 dataset. Starting from a diffusion-predicted sample, CAS Langevin refinement further reduced per-image σN\sigma_N0 error below that of baseline diffusion-posterior sampling (DPS), while the Fréchet Inception Distance (FID) remained comparable or improved for small step sizes. CAS was observed to better preserve fine detail and structure in inpainting and super-resolution tasks (Xun et al., 30 Oct 2025).

In annealed SVGD, CAS empirically achieves superior mode coverage and weight reconstruction compared to standard SVGD, particularly on highly multimodal distributions and in high-dimensional settings. On various benchmarks, mode coverage improved from 1 to the full number of modes, and mean maximum discrepancy (MMD) was reduced by up to 80% (d'Angelo et al., 2021).

7. Significance and Theoretical Implications

Consistent Annealed Sampling unifies a broad class of iterative sampling schemes by addressing failure modes associated with finite-step schedules and imperfect score approximations. In diffusion/posterior contexts, it is the first method to provide polynomial-time guarantees for posterior sampling under an σN\sigma_N1 score error, rather than requiring strong exponential error bounds. In particle-based samplers, it enables deterministic, single-chain exploration of complex multimodal targets, preserving all favorable convergence properties of classical SVGD.

A plausible implication is that many existing denoising-diffusion and score-based sampling schemes can be reformulated as special instances or limits of CAS, which encourages uniform adoption of consistent scheduling as a standard for robust generative modeling and posterior inference.


Selected References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Consistent Annealed Sampling (CAS).