Blur Diffusion Model (BlurDM)

Updated 6 December 2025

BlurDM is a generative model that incorporates physical blur priors and adaptive kernel estimation to robustly remove non-uniform blur in images.
It leverages a latent kernel prediction network (LKPN) and an element-wise adaptive convolution module to progressively refine image structure during diffusion.
Experimental results show that BlurDM outperforms traditional CNN/GAN methods, preserving fine details and reducing artifacts under extreme blur conditions.

A Blur Diffusion Model (BlurDM) is a generative diffusion framework that explicitly incorporates the physical or statistical structure of blur—such as spatially variant kernels, motion integration, or frequency-domain attenuation—into either the forward corruption or reverse restoration processes of diffusion models for high-fidelity image deblurring. Modern BlurDMs are distinguished by their integration of blur formation priors, adaptive kernel estimation within latent space, and joint diffusion conditioning pipelines that enable robust removal of real-world, spatially varying, and non-uniform blur across natural images. This paradigm is represented by frameworks such as DeblurDiff, where blur and denoising are coupled in both mathematical formulation and network architecture (Kong et al., 6 Feb 2025).

1. Mathematical Formulation: Forward and Reverse Processes

BlurDM leverages a latent diffusion process built atop pre-trained Stable Diffusion (SD) backbones. Let $X_B$ denote a real blurred image. The forward or noising process operates in the latent space:

$z_t = \sqrt{\bar\alpha_t} z_0 + \sqrt{1-\bar\alpha_t} \epsilon, \qquad \epsilon \sim \mathcal{N}(0, I)$

where $z_0 = \mathcal{E}(x_0)$ encodes a sharp reference, and $\{\alpha_t\}$ , $\bar\alpha_t$ are deterministic schedules.

The reverse diffusion step involves score-based denoising: $z_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( z_t - \frac{1-\alpha_t}{\sqrt{1-\bar\alpha_t}} \epsilon_\theta(z_t, t) \right) + \sigma_t \zeta$ where $\epsilon_\theta$ is predicted by a score model incorporating both control signals from the input blur and structure guidance, and $\sigma_t$ is the sampling variance. The deblurring process is thus enacted not by direct kernel inversion but by learning an implicitly invertible mapping in latent space, conditioned on learned priors that encode blur formation physics (Kong et al., 6 Feb 2025).

2. Architecture: Latent Kernel Prediction and Adaptive Convolution

Central to BlurDM is the Latent Kernel Prediction Network (LKPN), a latent-space U-Net that estimates a spatially variant deblurring kernel $k_t = \mathrm{LKPN}(z_t, z_{lq}, t)$ at each diffusion step, where $z_{lq}$ is the VAE-encoded blurred latent. Each kernel has local support $k \times k$ per channel and spatial location.

The Element-wise Adaptive Convolution (EAC) module applies this kernel field to the blurred latent, yielding a sharp structure guidance: $z^s_t(i, j, c) = \sum_{u=-r}^r \sum_{v=-r}^r \mathcal{K}_{i, j, c}(u, v) z_{lq}(i+u, j+v, c)$ where $r = (k-1)/2$ , and $\mathcal{K}_{i,j,c}$ is the predicted kernel at each location and channel. EAC adaptively sharpens the latent while maintaining input structure, serving as an information-preserving, differentiable local deblurring layer.

During diffusion, LKPN and EAC form a closed refinement loop: the kernel is updated at every timestep with partially denoised latents, allowing structure guidance to improve progressively as the diffusion traverses steps toward the clean image (Kong et al., 6 Feb 2025).

3. Conditioning, Training Losses, and Optimization

The generative backbone is a ControlNet-augmented SD U-Net, which, at each timestep, conditions on $(z_t, z_{lq}, z^s_t)$ . The end-to-end loss unites two objectives:

$\mathcal{L} = \mathcal{L}_\text{denoise} + \mathcal{L}_\text{LKPN}$

with

$\mathcal{L}_\text{denoise} = \mathbb{E}_{t, z_0, \epsilon} \|\epsilon - \epsilon_\theta(z_t, t)\|^2_2$

$\mathcal{L}_\text{LKPN} = \mathcal{L}_\text{latent} + \mathcal{L}_\text{pixel}$

$\mathcal{L}_\text{latent} = \mathbb{E} \| z_0 - \mathrm{EAC}(z_{lq}, k_t) \|^2_2,\quad \mathcal{L}_\text{pixel} = \mathbb{E} \| x_0 - \mathcal{D}(\mathrm{EAC}(z_{lq}, k_t)) \|^2_2$

No adversarial or perceptual losses beyond the VAE decoder's pixel reconstruction term are used. LKPN and the ControlNet branch (SD UNet plus ZeroConv layers for additional channels) are trained jointly, initializing from the frozen SD weights (Kong et al., 6 Feb 2025).

Inference proceeds as:

Encode the blurred image $X_B$ via the VAE encoder to get $z_{lq}$ .
Sample the initial latent $z_T \sim \mathcal{N}(0, I)$ .
For $t = T \rightarrow 1$ $t = T \to 1$ (see Table below), at each step:
- Predict kernel $k_t = \mathrm{LKPN}(z_t, z_{lq}, t)$ .
- Compute structure guidance $z^s_t = \mathrm{EAC}(z_{lq}, k_t)$ .
- Input $(z_t, z_{lq}, z^s_t)$ into ControlNet for noise prediction.
- Update $z_{t-1}$ using the standard DDPM rule.

Step	Computation
1	$z_{lq} \leftarrow \text{VAE encoder}(X_B)$
2	$z_T \sim \mathcal{N}(0, I)$
3 (loop)	Predict $k_t$ , compute $z^s_t$ , update $z_{t-1}$

At each timestep, the partially denoised latent $z_{t-1}$ informs subsequent kernel prediction, tightening the interplay between deblurring and denoising (Kong et al., 6 Feb 2025).

5. Experimental Results and Ablation Analyses

BlurDM (DeblurDiff) surpasses state-of-the-art CNN/GAN-based, as well as previous diffusion-based, deblurring methods on both synthetic and real-world datasets. On the Real Blurry Images set (no ground truth), DeblurDiff achieved:

NIQE: 3.6628 (best; lower is better)
MUSIQ: 52.9263
MANIQA: 0.5963
CLIP-IQA: 0.5496

On synthetic DVD, it attains an NIQE of 2.7822 (next best: 3.1357). Qualitative comparisons reveal that DeblurDiff better preserves fine image structures under extreme blur, avoids hallucinated artifacts common with direct ControlNet conditioning or two-stage pipelines, and demonstrates progressive kernel/latent refinement yielding crisper edges at lower steps.

Ablation studies indicate substantial degradation when EAC or SD priors in LKPN are removed, and direct conditioning on $z_{lq}$ produces poor recovery under strong blur (Kong et al., 6 Feb 2025).

6. Distinctive Features, Significance, and Limitations

BlurDM's significance lies in co-training a spatially-variant deblurring kernel predictor directly in latent space, coupled to a controlled diffusion process that leverages powerful SD image priors without sacrificing input structure. Its refinement loop (structure estimates improving kernel prediction, which recursively guides denoising) is unique among diffusion methods for real image restoration.

The approach enables state-of-the-art restoration on challenging benchmarks and real blurry photographs, with robust performance even under unconstrained (non-uniform, severe) blur scenarios.

Limitations include dependency on the underlying SD latent space and the assumption of a forward blur formation compatible with its deblurring operators. The method is currently focused on motion/realistic blur types and hasn't yet been extended to defocus or non-convolutional degradation models. However, its architectural features—element-wise adaptive kernels, latent-space operation, and joint iterative kernel/latent estimation—provide a robust blueprint for future research in physically interpretable conditional generative models for image restoration (Kong et al., 6 Feb 2025).

Markdown Upgrade to Chat

References (1)

DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Blur Diffusion Model (BlurDM).

Blur Diffusion Model (BlurDM)

1. Mathematical Formulation: Forward and Reverse Processes

2. Architecture: Latent Kernel Prediction and Adaptive Convolution

3. Conditioning, Training Losses, and Optimization

4. Inference Pipeline and Kernel Refinement

5. Experimental Results and Ablation Analyses

6. Distinctive Features, Significance, and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Blur Diffusion Model (BlurDM)

1. Mathematical Formulation: Forward and Reverse Processes

2. Architecture: Latent Kernel Prediction and Adaptive Convolution

3. Conditioning, Training Losses, and Optimization

4. Inference Pipeline and Kernel Refinement

5. Experimental Results and Ablation Analyses

6. Distinctive Features, Significance, and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics