Generative Diffusion Priors

Updated 17 November 2025

Generative diffusion priors are probabilistic models that use denoising diffusion processes to capture rich, data-driven prior knowledge.
They integrate learned score functions and noise predictors into inverse problems, regularizing tasks like image restoration and domain transfer.
These priors enable flexible plug-and-play frameworks, aiding test-time adaptation, semantic decomposition, and multimodal fusion.

Generative diffusion priors are explicitly defined as probabilistic models leveraging denoising diffusion processes to encode powerful, high-capacity data-driven prior knowledge. These priors are instantiated as the implicit distributions, score functions, or noise-predictor networks learned by diffusion models—most commonly denoising diffusion probabilistic models (DDPMs) and score-based generative models (SGMs)—and are employed to regularize, guide, or constrain downstream tasks beyond pure generation. The diffusion prior is typically integrated with data- or task-dependent constraints, enabling flexible application to inverse problems, test-time adaptation, domain transfer, structured semantic reasoning, multimodal fusion, and restoration tasks. Research in this area focuses on understanding the structure of diffusion priors, practical algorithmic mechanisms for plugging such priors into new settings, and theoretical analysis of convergence and recovery guarantees.

1. Mathematical Formulation of Generative Diffusion Priors

Consider the canonical formulation of a denoising diffusion probabilistic model, defined over data $\mathbf{x}\in\mathbb{R}^d$ . The generative diffusion prior arises from the joint distribution

$q(\mathbf{x}_{0:T}) = q(\mathbf{x}_0) \prod_{t=1}^T q(\mathbf{x}_t|\mathbf{x}_{t-1}), \qquad q(\mathbf{x}_t | \mathbf{x}_{t-1}) = \mathcal{N}(\sqrt{1-\beta_t}\,\mathbf{x}_{t-1}, \beta_t I)$

with $\{\beta_t\}$ a monotonic noise schedule. The corresponding reverse (generative) process models $p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)$ , parameterized as a Gaussian whose mean is often defined in terms of a neural noise estimator $\epsilon_\theta(\mathbf{x}_t, t)$ : $\mu_\theta(\mathbf{x}_t, t) = \frac{1}{\sqrt{1-\beta_t}} \left( \mathbf{x}_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \epsilon_\theta(\mathbf{x}_t, t) \right)$ where $\bar{\alpha}_t = \prod_{s=1}^t (1-\beta_s)$ .

The implicit prior over data $p_\theta(\mathbf{x})$ is the marginal of the learned reverse process, and the associated score function is $\nabla_{\mathbf{x}} \log p_\theta(\mathbf{x})$ , implemented numerically by the neural network. This diffusion prior can be operated on directly (via sampling, score evaluation, or Tweedie’s formula), or combined with task-dependent terms to yield posterior or constrained-inference objectives (Graikos et al., 2022, Li et al., 10 Nov 2025, Fei et al., 2023).

2. Integration of Diffusion Priors in Inverse Problems and Plug-and-Play Frameworks

Diffusion priors have been successfully integrated into plug-and-play (PnP) frameworks for a variety of inverse problems. In these methods, the learned diffusion prior acts as a proximal map or denoising step, typically within iterative optimization for image/signal restoration (Li et al., 10 Nov 2025, Graikos et al., 2022, Fei et al., 2023):

MAP Framework:

$\hat{\mathbf{x}} = \arg\min_{\mathbf{x}} \Big\{ -\log p(\mathbf{y}|\mathbf{x}) + \lambda\,(-\log p_\theta(\mathbf{x})) \Big\}$

For non-Gaussian observation noise (e.g., impulse noise), a generalized Gaussian scale mixture leads to a data-fidelity term $\|\mathbf{A}\mathbf{x}-\mathbf{y}\|_q^q$ . An IRLS algorithm alternates between a weighted least-squares update and a plug-and-play diffusion prior step, where the denoising is carried out by a pretrained diffusion model acting as a proximal operator: $\mathrm{prox}_{\gamma\mathcal{R}}(\mathbf{u}) \approx \mathbf{D}_t(\mathbf{u}') = \frac{\mathbf{u}' - \sqrt{1-\alpha_t}\epsilon_\theta(\mathbf{u}', t)}{\sqrt{\alpha_t}}$ with $\mathbf{u}' = \sqrt{\alpha_t}\mathbf{u} + \sqrt{1-\alpha_t}\mathbf{n}$ (Li et al., 10 Nov 2025).

This formulation enables robust, high-quality restoration for both Gaussian and strongly non-Gaussian noise by exploiting the expressive learned statistics embedded in the diffusion prior.

3. Conditional Inference, Constraints, and Optimization over Diffusion Priors

Generative diffusion priors can be flexibly combined with arbitrary differentiable constraints to permit conditional generation and test-time adaptation. The optimization is typically framed in terms of a variational free energy or point-estimate functional: $F(\eta) = \sum_{t\in\mathcal{S}} \mathbb{E}_{\epsilon} [\|\epsilon - \epsilon_\theta(x_t, t)\|^2] - \log c(\eta, \mathbf{y})$ where $c(\mathbf{x}, \mathbf{y})$ encodes the differentiable constraint and $x_t = \sqrt{\bar{\alpha}_t} \eta + \sqrt{1-\bar{\alpha}_t} \epsilon$ (Graikos et al., 2022).

Inference is conducted via gradient descent on $\eta$ , with each step involving backpropagation through the diffusion score network and the constraint. This method is highly generic, allowing plug-and-play integration of pre-trained diffusion models with constraints ranging from measurement fidelity in inverse problems, semantic edit requirements, classifier outputs, or even structured combinatorial objectives.

4. Structured Priors, Semantic Decomposition, and Discriminative Applications

Recent work reveals that the diffusion score function intrinsically encodes structured semantic priors over labels and other latent variables. The score decomposition (Proposition 1) for a distribution $p(\mathbf{x}) = \sum_y p(y) p(\mathbf{x}|y)$ yields

$\nabla_\mathbf{x}\log p(\mathbf{x}) = \sum_y p(y|\mathbf{x})\nabla_\mathbf{x}\log p(\mathbf{x}|y)$

implying that the unconditional score is a mixture over label-conditional scores weighted by $p(y|\mathbf{x})$ . In diffusion models, this mixture structure persists at every noise level, establishing that the conditional noise predictors encode discriminative priors (Li et al., 1 Jan 2025).

Such structure is exploited in frameworks like DUSA for test-time adaptation, which extract priors from the conditional denoising heads to guide discriminative models without retraining. Enforcing agreement between the true noise and a mixture of conditional noise predictions at a single timestep has been shown to yield strong gains in semantic robustness and out-of-distribution generalization.

5. Diffusion Priors in Multimodal, Hierarchical, and Latent-space Models

Generative diffusion priors are highly compatible with hierarchical models, multimodal fusion, and latent spaces:

In variational autoencoders (VAEs), the Gaussian prior in the latent space can be replaced by a learned diffusion prior, yielding more expressive latent distributions and closing the performance gap with normalizing flows (Wehenkel et al., 2021).
Multimodal diffusion priors enable product-of-experts fusion over multiple individually-trained conditional diffusion experts, allowing the model to condition on arbitrary combinations of modalities by leveraging the analytic tractability of Gaussian fusion at each diffusion timestep (Nair et al., 2022).

This compositional capability is distinct from VAE-based or conventional Gaussian priors, which require retraining for each new modality or constraint.

6. Theoretical Analysis and Recovery Guarantees

Recent deterministic recovery theory for diffusion priors demonstrates that, under certain conditions, the learned score fields act as time-varying approximate projection operators onto a low-dimensional model set. For linear inverse problems, the projected-gradient-descent interpretation enables provable convergence bounds, quantifying the dependence on the noise schedule, measurement design, and regularity of the learned prior (Leong et al., 24 Sep 2025).

Specifically, if the restricted isometry property (RIP) holds for the measurement operator ( $\delta\beta<1$ with $\beta$ the Lipschitz constant) and the diffusion score accurately approximates the projection onto the data manifold, then the recovery error decays geometrically as a function of the iteration and schedule.

7. Practical Applications and Limitations

Generative diffusion priors are actively used in a diverse array of tasks:

Blind and non-blind image restoration and enhancement (Fei et al., 2023, Li et al., 10 Nov 2025)
Speech enhancement via latent-space diffusion priors (Kumar et al., 9 Mar 2025)
Inverse problems in medical imaging (MRI, super-resolution, inpainting) (Aguila et al., 16 Oct 2025)
Channel estimation in massive MIMO systems (Fesl et al., 6 Mar 2024)
Unified frameworks for test-time adaptation, domain transfer, and discriminative robustness (Li et al., 1 Jan 2025)
Dataset distillation via representativeness priors formalized as kernel distances (Su et al., 20 Oct 2025)
Plug-and-play protein structure reconstruction with adaptive fidelity/weighting (Banerjee et al., 28 Jul 2025)
3D human rendering under occlusion by leveraging 2D diffusion priors (Sun et al., 29 Jun 2024)
Cross-modal and multimodal product-of-experts fusion (Nair et al., 2022)

Despite their expressivity and flexibility, diffusion prior-based methods often entail increased computational cost during inference due to iterative denoising/optimization steps, and their performance can be sensitive to scheduling of hyperparameters and integration strategies for conditional guidance. Balancing prior and constraint losses remains an empirical design choice, and further theoretical work is needed on convergence in non-Gaussian, nonconvex, or highly-undersampled regimes.

References: