Blur Diffusion Model (BlurDM)
- BlurDM is a generative model that incorporates physical blur priors and adaptive kernel estimation to robustly remove non-uniform blur in images.
- It leverages a latent kernel prediction network (LKPN) and an element-wise adaptive convolution module to progressively refine image structure during diffusion.
- Experimental results show that BlurDM outperforms traditional CNN/GAN methods, preserving fine details and reducing artifacts under extreme blur conditions.
A Blur Diffusion Model (BlurDM) is a generative diffusion framework that explicitly incorporates the physical or statistical structure of blur—such as spatially variant kernels, motion integration, or frequency-domain attenuation—into either the forward corruption or reverse restoration processes of diffusion models for high-fidelity image deblurring. Modern BlurDMs are distinguished by their integration of blur formation priors, adaptive kernel estimation within latent space, and joint diffusion conditioning pipelines that enable robust removal of real-world, spatially varying, and non-uniform blur across natural images. This paradigm is represented by frameworks such as DeblurDiff, where blur and denoising are coupled in both mathematical formulation and network architecture (Kong et al., 6 Feb 2025).
1. Mathematical Formulation: Forward and Reverse Processes
BlurDM leverages a latent diffusion process built atop pre-trained Stable Diffusion (SD) backbones. Let denote a real blurred image. The forward or noising process operates in the latent space:
where encodes a sharp reference, and , are deterministic schedules.
The reverse diffusion step involves score-based denoising: where is predicted by a score model incorporating both control signals from the input blur and structure guidance, and is the sampling variance. The deblurring process is thus enacted not by direct kernel inversion but by learning an implicitly invertible mapping in latent space, conditioned on learned priors that encode blur formation physics (Kong et al., 6 Feb 2025).
2. Architecture: Latent Kernel Prediction and Adaptive Convolution
Central to BlurDM is the Latent Kernel Prediction Network (LKPN), a latent-space U-Net that estimates a spatially variant deblurring kernel at each diffusion step, where is the VAE-encoded blurred latent. Each kernel has local support per channel and spatial location.
The Element-wise Adaptive Convolution (EAC) module applies this kernel field to the blurred latent, yielding a sharp structure guidance: where , and is the predicted kernel at each location and channel. EAC adaptively sharpens the latent while maintaining input structure, serving as an information-preserving, differentiable local deblurring layer.
During diffusion, LKPN and EAC form a closed refinement loop: the kernel is updated at every timestep with partially denoised latents, allowing structure guidance to improve progressively as the diffusion traverses steps toward the clean image (Kong et al., 6 Feb 2025).
3. Conditioning, Training Losses, and Optimization
The generative backbone is a ControlNet-augmented SD U-Net, which, at each timestep, conditions on . The end-to-end loss unites two objectives:
with
No adversarial or perceptual losses beyond the VAE decoder's pixel reconstruction term are used. LKPN and the ControlNet branch (SD UNet plus ZeroConv layers for additional channels) are trained jointly, initializing from the frozen SD weights (Kong et al., 6 Feb 2025).
4. Inference Pipeline and Kernel Refinement
Inference proceeds as:
- Encode the blurred image via the VAE encoder to get .
- Sample the initial latent .
- For (see Table below), at each step:
- Predict kernel .
- Compute structure guidance .
- Input into ControlNet for noise prediction.
- Update using the standard DDPM rule.
| Step | Computation |
|---|---|
| 1 | |
| 2 | |
| 3 (loop) | Predict , compute , update |
At each timestep, the partially denoised latent informs subsequent kernel prediction, tightening the interplay between deblurring and denoising (Kong et al., 6 Feb 2025).
5. Experimental Results and Ablation Analyses
BlurDM (DeblurDiff) surpasses state-of-the-art CNN/GAN-based, as well as previous diffusion-based, deblurring methods on both synthetic and real-world datasets. On the Real Blurry Images set (no ground truth), DeblurDiff achieved:
- NIQE: 3.6628 (best; lower is better)
- MUSIQ: 52.9263
- MANIQA: 0.5963
- CLIP-IQA: 0.5496
On synthetic DVD, it attains an NIQE of 2.7822 (next best: 3.1357). Qualitative comparisons reveal that DeblurDiff better preserves fine image structures under extreme blur, avoids hallucinated artifacts common with direct ControlNet conditioning or two-stage pipelines, and demonstrates progressive kernel/latent refinement yielding crisper edges at lower steps.
Ablation studies indicate substantial degradation when EAC or SD priors in LKPN are removed, and direct conditioning on produces poor recovery under strong blur (Kong et al., 6 Feb 2025).
6. Distinctive Features, Significance, and Limitations
BlurDM's significance lies in co-training a spatially-variant deblurring kernel predictor directly in latent space, coupled to a controlled diffusion process that leverages powerful SD image priors without sacrificing input structure. Its refinement loop (structure estimates improving kernel prediction, which recursively guides denoising) is unique among diffusion methods for real image restoration.
The approach enables state-of-the-art restoration on challenging benchmarks and real blurry photographs, with robust performance even under unconstrained (non-uniform, severe) blur scenarios.
Limitations include dependency on the underlying SD latent space and the assumption of a forward blur formation compatible with its deblurring operators. The method is currently focused on motion/realistic blur types and hasn't yet been extended to defocus or non-convolutional degradation models. However, its architectural features—element-wise adaptive kernels, latent-space operation, and joint iterative kernel/latent estimation—provide a robust blueprint for future research in physically interpretable conditional generative models for image restoration (Kong et al., 6 Feb 2025).