Blur-Noise Mixture Diffusion Models

Updated 3 March 2026

Blur-noise mixture diffusion is a generative framework that corrupts images via spatial blurring and additive noise, then restores them using learned deblurring and denoising networks.
It leverages scale-space theory and spectral decomposition to decouple low- and high-frequency components, optimizing the trade-off between detail preservation and noise reduction.
Empirical studies demonstrate its effectiveness, achieving improved metrics in image generation and restoration, such as better PSNR, FID, and SSIM on standard datasets.

Blur-noise mixture diffusion refers to a class of generative models and restoration frameworks in which the forward stochastic process combines both spatial blurring and additive noise to corrupt the data, while the reverse process inverts this—typically via learned denoising and deblurring networks. These models generalize classical denoising diffusion probabilistic models (DDPMs), which rely purely on additive Gaussian noise, and encompass “cold diffusion” (deterministic blurring only) as well as intermediate regimes. The resulting algorithms exhibit strong connections to scale-space theory, signal regularization, and hybrid spectral control in image formation and restoration.

1. Mathematical Formulation and Forward Processes

The classical DDPM forward process iteratively corrupts an image $x_0$ using isotropic Gaussian noise,

$q(x_t | x_0) = \mathcal{N}\big(x_t; \sqrt{\alpha_t} x_0, (1 - \alpha_t) I\big),$

but this is insufficient for modeling physics-driven degradations such as motion blur, or for leveraging frequency-specific regularities in visual data. Blur-noise mixture diffusion generalizes the forward operator to include both linear convolutional blur and noise. In its most general form, the forward process at each step $t$ applies a Gaussian blur $G_{\sigma_b(t)}$ to $x_0$ and adds Gaussian noise $\epsilon_t \sim \mathcal{N}(0, \sigma_n^2(t) I)$ :

$x_t = G_{\sigma_b(t)} * x_0 + \epsilon_t.$

Alternatively, in spectral (DCT) coordinates, the process is characterized by a diagonal blurring mask $M_{\alpha_t}$ and noise standard deviation $\beta_t$ :

$q(x_{\alpha_t, \beta_t} \mid x_0) = \mathcal{N}(V M_{\alpha_t} V^T x_0,\, \beta_t^2 I),$

where $V$ denotes the DCT basis (Hsueh et al., 21 Nov 2025, Hoogeboom et al., 2022).

A scalar “Blur-to-Noise Ratio” (BNR), defined as $\mathrm{BNR}_t = \alpha_t/\beta_t$ , parameterizes the relative influence of blur and noise. This formulation smoothly interpolates between “cold diffusion” (pure blur, large $\mathrm{BNR}$ ) and “hot diffusion” (pure noise, small $\mathrm{BNR}$ ) (Hsueh et al., 21 Nov 2025, Bansal et al., 2022).

2. Reverse Processes: Denoising, Deblurring, and Composite Score Models

The reverse process inverts the blur–noise mixture, aiming to transform a maximally degraded sample back into a realistic image. The learned transition at each step is typically Gaussian:

$p_\theta(x_{t-1} \mid x_t) = \mathcal{N}(\mu_\theta(x_t, t),\, \sigma_t^2 I).$

Parameterizing $\mu_\theta$ is central. A divide-and-conquer approach decomposes the task into two branches:

A denoising network $D_\theta$ predicts blurred versions (restoring low-frequency structure).
A deblurring network $R_\theta$ predicts the high-frequency (residual) component.

The mean update can thus be written as a sum of identity, deblurring, and denoising corrections, with spectral masks aligning with the blurring operator (Hsueh et al., 21 Nov 2025). Variants incorporate composite noise estimation in the context of factorized diffusion, where blur and residual (sharp) components are each associated with distinct conditionings and merged via linear operators (e.g., a motion-blur kernel $K$ and $(I-K)$ ). The reverse update for the composite noise estimate $\hat{\epsilon}$ is then:

$\hat{\epsilon}(x_t, t) = K * \epsilon_\theta(x_t, y_{\text{blur}}, t) + (I-K) * \epsilon_\theta(x_t, y_{\text{res}}, t)$

and this is substituted into the sampling update (Geng et al., 2024).

For restoration-specific applications, plug-and-play modules such as NFCDS inject noise only at high frequencies in the reverse step, recognizing that low-frequency noise drives blurring and fidelity loss, while high-frequency noise enables texture recovery (Wang et al., 29 Jan 2026).

3. Algorithmic Workflows and Sampling Procedures

Sampling in blur-noise mixture diffusion proceeds in $T$ steps, typically starting from an isotropic Gaussian noise image or a maximally blurred, noisy observation. The core update follows one of several algorithmic templates:

Factorized Diffusion (two-component, blur and residual):

for t = T, ..., 1:
    ε_blur = ε_θ(x_t, y_blur, t)
    ε_res  = ε_θ(x_t, y_res, t)
    ε̂      = B * ε_blur + (I-B) * ε_res
    x_{t−1} = update(x_t, ε̂)

where

B

is a linear blur operator—commonly a motion-blur convolution kernel (Geng et al., 2024).

Cold Diffusion Style (difference update with noise injection):

for s = T, ..., 1:
    x0_hat = R_model(x, s)
    x_blur_s  = conv_gauss(x0_hat, b_s)
    x_blur_s1 = conv_gauss(x0_hat, b_{s-1})
    x_tilde  = x - x_blur_s + x_blur_s1
    noise_s1 = torch.randn_like(x) * sigma_n_schedule[s-1]
    x = x_tilde + noise_s1

where blurring and noise schedules

\{\sigma_b(t)\}

\{\sigma_n(t)\}

are specified (linear or exponential) (Bansal et al., 2022).

Spectral Denoising (NFCDS filter in reverse step):

Noise injected at each reverse step is Fourier transformed, subjected to a frequency-sharp mask (attenuating low frequencies; thus eliminating blur-inducing components), and then inverted back for use in the update (Wang et al., 29 Jan 2026).

Latent-space implementations (as in BlurDM) encode the blurred input into a compact code and perform the reverse chain in this space, fusing the restored latent into decoder blocks (He et al., 3 Dec 2025).

4. Spectral and Theoretical Analysis

Blur-noise mixture diffusion directly models the interplay between two key physical mechanisms:

Blurring (linear convolution or heat-dissipation): regularizes high frequencies, prioritizing large-scale structure, and enforces Markovian, scale-space properties (semigroup, Lyapunov monotonicity, rotation invariance) (Peter, 2023).
Noise (additive Gaussian): injects stochasticity, allowing for data diversity and acting as a generative prior on fine details.

Spectral analyses (e.g., in the DCT basis) show that natural image content and noise occupy distinct frequency regimes, justifying principal strategies such as:

Decoupling denoising (recovery of low-frequency content) from deblurring (restoration of high-frequency content).
Tuning BNR for optimal trade-off: empirical studies find BNR $\approx 0.3$ –$0.5$ yields optimal FID and IS, balancing detail and manifold coverage (Hsueh et al., 21 Nov 2025).
Progressive or fixed frequency-masking of noise during restoration to reconcile perceptual quality (texture) with global fidelity (Wang et al., 29 Jan 2026).

These regimes unify standard hot/cold diffusion, provide theoretical motivation for mixed degradation modeling, and explain convergence/overfitting trends observed empirically.

5. Applications: Generation, Restoration, and Hybrid Perceptual Control

Blur-noise mixture diffusion has broad utility, including:

Image Generation: Warm Diffusion (BNMD) achieves state-of-the-art FID/IS on CIFAR-10, FFHQ, and LSUN by exploiting spectral decoupling and optimal BNR (Hsueh et al., 21 Nov 2025, Hoogeboom et al., 2022).
Image Restoration: BlurDM models the joint degradation process in dynamic scene deblurring, implements dual estimators for blur and noise inversion in latent space, and achieves notable PSNR/SSIM gains with minimal overhead across multiple deblurring backbones (He et al., 3 Dec 2025).
Hybrid Illusions and Perceptual Control: Factorized Diffusion enables controlled compositional sampling—e.g., hybrid images that morph under motion-blur or scale, with conditioning split between blur and residual components. Motion-hybrid images change semantic identity contingent on applied blur (Geng et al., 2024).
Plug-and-Play Spectral Filtering: NFCDS provides a modular spectral filter for reverse diffusion noise that sharply improves the fidelity-perception trade-off in super-resolution and denoising tasks. Gains include PSNR boosts (up to +1.26 dB), improved SSIM, and reduced LPIPS, with no need for retraining (Wang et al., 29 Jan 2026).

6. Implementation and Training Strategies

Training blur-noise mixture diffusion models typically involves minimizing mean-squared error (MSE) or variational objectives between predicted means and targets, customized for the mixed degradation process:

Direct regression to clean images or residual prediction (blur+noise) (Bansal et al., 2022).
Spectral losses targeting reconstruction in frequency bands (Hsueh et al., 21 Nov 2025).
Dual-branch networks specializing in denoising and deblurring (Hsueh et al., 21 Nov 2025).
Plug-and-play modules operate at inference with no requisite retraining, exploiting the flexibility of the diffusion backbone (Wang et al., 29 Jan 2026).

Scheduling for blur and noise is essential; linear and exponential ramps are common, but spectral adaptation (frequency-dependent masking) is increasingly prevalent in restoration-focused designs (Wang et al., 29 Jan 2026). In latent-space variants, forward and reverse processes run in compact codes to accelerate sampling and enable integration with deblurring architectures (He et al., 3 Dec 2025).

7. Empirical Findings and Trade-Offs

A range of empirical studies substantiates the advantages of blur-noise mixture diffusion:

On CIFAR-10 (unconditional, $NFE=35$ ), Warm Diffusion achieves FID=1.85, IS=10.02—outperforming EDM/DDPM (Hsueh et al., 21 Nov 2025).
Latent blur-noise mixture diffusion (BlurDM) yields consistent PSNR and SSIM improvements across four standard deblurring datasets and architectures, with the largest single gain (+1.16 dB PSNR) on RealBlur-J with Stripformer (He et al., 3 Dec 2025).
Plug-and-play spectral filtering (NFCDS) improves PSNR by up to +1.26 dB and halves function evaluations at equivalent or better perceptual quality (Wang et al., 29 Jan 2026).
Qualitative visualizations demonstrate that blur–noise mixture yields images transitioning smoothly from coarse structures to fine details, closely matching osmosis-filtering behavior and improving both statistical metrics and human preference under controlled ablations (Peter, 2023, Geng et al., 2024).

In summary, blur-noise mixture diffusion offers a principled, versatile framework synthesizing blur and noise, unifying generative and restoration paradigms, optimizing perceptual/fidelity trade-offs, and enabling both theoretically grounded and empirically superior solutions in image modeling (Hsueh et al., 21 Nov 2025, Geng et al., 2024, He et al., 3 Dec 2025, Hoogeboom et al., 2022, Wang et al., 29 Jan 2026, Peter, 2023, Bansal et al., 2022).