Papers
Topics
Authors
Recent
2000 character limit reached

Blur Diffusion Model (BlurDM)

Updated 6 December 2025
  • BlurDM is a generative model that incorporates physical blur priors and adaptive kernel estimation to robustly remove non-uniform blur in images.
  • It leverages a latent kernel prediction network (LKPN) and an element-wise adaptive convolution module to progressively refine image structure during diffusion.
  • Experimental results show that BlurDM outperforms traditional CNN/GAN methods, preserving fine details and reducing artifacts under extreme blur conditions.

A Blur Diffusion Model (BlurDM) is a generative diffusion framework that explicitly incorporates the physical or statistical structure of blur—such as spatially variant kernels, motion integration, or frequency-domain attenuation—into either the forward corruption or reverse restoration processes of diffusion models for high-fidelity image deblurring. Modern BlurDMs are distinguished by their integration of blur formation priors, adaptive kernel estimation within latent space, and joint diffusion conditioning pipelines that enable robust removal of real-world, spatially varying, and non-uniform blur across natural images. This paradigm is represented by frameworks such as DeblurDiff, where blur and denoising are coupled in both mathematical formulation and network architecture (Kong et al., 6 Feb 2025).

1. Mathematical Formulation: Forward and Reverse Processes

BlurDM leverages a latent diffusion process built atop pre-trained Stable Diffusion (SD) backbones. Let XBX_B denote a real blurred image. The forward or noising process operates in the latent space:

zt=αˉtz0+1αˉtϵ,ϵN(0,I)z_t = \sqrt{\bar\alpha_t} z_0 + \sqrt{1-\bar\alpha_t} \epsilon, \qquad \epsilon \sim \mathcal{N}(0, I)

where z0=E(x0)z_0 = \mathcal{E}(x_0) encodes a sharp reference, and {αt}\{\alpha_t\}, αˉt\bar\alpha_t are deterministic schedules.

The reverse diffusion step involves score-based denoising: zt1=1αt(zt1αt1αˉtϵθ(zt,t))+σtζz_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( z_t - \frac{1-\alpha_t}{\sqrt{1-\bar\alpha_t}} \epsilon_\theta(z_t, t) \right) + \sigma_t \zeta where ϵθ\epsilon_\theta is predicted by a score model incorporating both control signals from the input blur and structure guidance, and σt\sigma_t is the sampling variance. The deblurring process is thus enacted not by direct kernel inversion but by learning an implicitly invertible mapping in latent space, conditioned on learned priors that encode blur formation physics (Kong et al., 6 Feb 2025).

2. Architecture: Latent Kernel Prediction and Adaptive Convolution

Central to BlurDM is the Latent Kernel Prediction Network (LKPN), a latent-space U-Net that estimates a spatially variant deblurring kernel kt=LKPN(zt,zlq,t)k_t = \mathrm{LKPN}(z_t, z_{lq}, t) at each diffusion step, where zlqz_{lq} is the VAE-encoded blurred latent. Each kernel has local support k×kk \times k per channel and spatial location.

The Element-wise Adaptive Convolution (EAC) module applies this kernel field to the blurred latent, yielding a sharp structure guidance: zts(i,j,c)=u=rrv=rrKi,j,c(u,v)zlq(i+u,j+v,c)z^s_t(i, j, c) = \sum_{u=-r}^r \sum_{v=-r}^r \mathcal{K}_{i, j, c}(u, v) z_{lq}(i+u, j+v, c) where r=(k1)/2r = (k-1)/2, and Ki,j,c\mathcal{K}_{i,j,c} is the predicted kernel at each location and channel. EAC adaptively sharpens the latent while maintaining input structure, serving as an information-preserving, differentiable local deblurring layer.

During diffusion, LKPN and EAC form a closed refinement loop: the kernel is updated at every timestep with partially denoised latents, allowing structure guidance to improve progressively as the diffusion traverses steps toward the clean image (Kong et al., 6 Feb 2025).

3. Conditioning, Training Losses, and Optimization

The generative backbone is a ControlNet-augmented SD U-Net, which, at each timestep, conditions on (zt,zlq,zts)(z_t, z_{lq}, z^s_t). The end-to-end loss unites two objectives:

L=Ldenoise+LLKPN\mathcal{L} = \mathcal{L}_\text{denoise} + \mathcal{L}_\text{LKPN}

with

Ldenoise=Et,z0,ϵϵϵθ(zt,t)22\mathcal{L}_\text{denoise} = \mathbb{E}_{t, z_0, \epsilon} \|\epsilon - \epsilon_\theta(z_t, t)\|^2_2

LLKPN=Llatent+Lpixel\mathcal{L}_\text{LKPN} = \mathcal{L}_\text{latent} + \mathcal{L}_\text{pixel}

Llatent=Ez0EAC(zlq,kt)22,Lpixel=Ex0D(EAC(zlq,kt))22\mathcal{L}_\text{latent} = \mathbb{E} \| z_0 - \mathrm{EAC}(z_{lq}, k_t) \|^2_2,\quad \mathcal{L}_\text{pixel} = \mathbb{E} \| x_0 - \mathcal{D}(\mathrm{EAC}(z_{lq}, k_t)) \|^2_2

No adversarial or perceptual losses beyond the VAE decoder's pixel reconstruction term are used. LKPN and the ControlNet branch (SD UNet plus ZeroConv layers for additional channels) are trained jointly, initializing from the frozen SD weights (Kong et al., 6 Feb 2025).

4. Inference Pipeline and Kernel Refinement

Inference proceeds as:

  1. Encode the blurred image XBX_B via the VAE encoder to get zlqz_{lq}.
  2. Sample the initial latent zTN(0,I)z_T \sim \mathcal{N}(0, I).
  3. For t=T1t = T \rightarrow 1 (see Table below), at each step:
    • Predict kernel kt=LKPN(zt,zlq,t)k_t = \mathrm{LKPN}(z_t, z_{lq}, t).
    • Compute structure guidance zts=EAC(zlq,kt)z^s_t = \mathrm{EAC}(z_{lq}, k_t).
    • Input (zt,zlq,zts)(z_t, z_{lq}, z^s_t) into ControlNet for noise prediction.
    • Update zt1z_{t-1} using the standard DDPM rule.
Step Computation
1 zlqVAE encoder(XB)z_{lq} \leftarrow \text{VAE encoder}(X_B)
2 zTN(0,I)z_T \sim \mathcal{N}(0, I)
3 (loop) Predict ktk_t, compute ztsz^s_t, update zt1z_{t-1}

At each timestep, the partially denoised latent zt1z_{t-1} informs subsequent kernel prediction, tightening the interplay between deblurring and denoising (Kong et al., 6 Feb 2025).

5. Experimental Results and Ablation Analyses

BlurDM (DeblurDiff) surpasses state-of-the-art CNN/GAN-based, as well as previous diffusion-based, deblurring methods on both synthetic and real-world datasets. On the Real Blurry Images set (no ground truth), DeblurDiff achieved:

  • NIQE: 3.6628 (best; lower is better)
  • MUSIQ: 52.9263
  • MANIQA: 0.5963
  • CLIP-IQA: 0.5496

On synthetic DVD, it attains an NIQE of 2.7822 (next best: 3.1357). Qualitative comparisons reveal that DeblurDiff better preserves fine image structures under extreme blur, avoids hallucinated artifacts common with direct ControlNet conditioning or two-stage pipelines, and demonstrates progressive kernel/latent refinement yielding crisper edges at lower steps.

Ablation studies indicate substantial degradation when EAC or SD priors in LKPN are removed, and direct conditioning on zlqz_{lq} produces poor recovery under strong blur (Kong et al., 6 Feb 2025).

6. Distinctive Features, Significance, and Limitations

BlurDM's significance lies in co-training a spatially-variant deblurring kernel predictor directly in latent space, coupled to a controlled diffusion process that leverages powerful SD image priors without sacrificing input structure. Its refinement loop (structure estimates improving kernel prediction, which recursively guides denoising) is unique among diffusion methods for real image restoration.

The approach enables state-of-the-art restoration on challenging benchmarks and real blurry photographs, with robust performance even under unconstrained (non-uniform, severe) blur scenarios.

Limitations include dependency on the underlying SD latent space and the assumption of a forward blur formation compatible with its deblurring operators. The method is currently focused on motion/realistic blur types and hasn't yet been extended to defocus or non-convolutional degradation models. However, its architectural features—element-wise adaptive kernels, latent-space operation, and joint iterative kernel/latent estimation—provide a robust blueprint for future research in physically interpretable conditional generative models for image restoration (Kong et al., 6 Feb 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Blur Diffusion Model (BlurDM).