FideDiff: Efficient Motion Deblurring Model

Updated 4 October 2025

FideDiff is a diffusion framework for efficient motion deblurring, using latent diffusion, kernel control, and adaptive timestep selection for single-step restoration.
It reformulates blur as a time-dependent diffusion process, ensuring consistent mapping from any blurred input to a clear image.
Benchmark results show superior performance on PSNR, SSIM, LPIPS, and DISTS metrics across datasets like GoPro, HIDE, and RealBlur, supporting diverse real-world applications.

FideDiff is a diffusion model framework designed for efficient, high-fidelity image motion deblurring. It combines a single-step diffusion paradigm, temporal consistency objectives, explicit blur kernel control, and adaptive timestep selection to achieve superior performance on standard and perceptual metrics. The model is constructed upon large-scale pretrained diffusion backbones with modifications tailored to the motion deblurring problem, and is benchmarked extensively on widely used datasets with public code and data resources.

1. Architectural Foundation and System Components

FideDiff employs a latent diffusion backbone built upon a pretrained Stable Diffusion 2.1 model and incorporates several custom modifications specific to deblurring:

VAE Backbone: The input low-quality (blurry) image is encoded into the latent space using a Variational Autoencoder (VAE) with reduced downsampling (d = 4 instead of the standard d = 8) to minimize spatial detail loss, thus improving restoration fidelity.
Diffusion Backbone: The model repurposes the standard diffusion process to operate as a motion blur trajectory rather than traditional Gaussian noise, with each diffusion timestep representing a discrete blur severity level.
Consistency Model: Under the consistency paradigm, the diffusion network is trained such that for any blurry latent input zₜ (sampled at timestep t) along the blur trajectory, the network predicts the same clean image latent z₀. This enables accurate single-step (t=1) deblurring at inference.
Kernel ControlNet: FideDiff integrates a Kernel ControlNet module within the UNet backbone, utilizing a kernel estimation subnetwork to condition the restoration on an explicit, spatially-varying blur kernel representation.
Adaptive Timestep Prediction: A small regression network predicts the “optimal” diffusion timestep t̂ for inference, conditioned on the estimated blur kernel. This allows the model to dynamically adapt to varying degrees and patterns of input motion blur.

2. Reformulation of Diffusion Process for Motion Deblurring

Motion deblurring is reframed as a diffusion-like Markov process, where, instead of simple Gaussian noise progression, “blurring” (via a kernel) is treated as the corruption operator:

$z_t = k_t * z_0 = \sqrt{\bar{\alpha}_t}\,z_0 + \sqrt{1-\bar{\alpha}_t}\,\hat{\epsilon}$

Here, $k_t$ is the time-dependent blur kernel, $\bar{\alpha}_t$ encodes diffusion progression, and $\hat{\epsilon}$ is the residual noise/error. Each $t$ corresponds to a specific blur severity on a trajectory generated by repeated averaging of video frames (as in GoPro), explicitly connecting physical blur to the diffusion formalism.

3. Consistency Training Paradigm

Rather than the traditional noisy-to-clean multi-step denoising, FideDiff is trained via a trajectory-wide consistency objective. The loss enforces:

$f_\theta(z_t, t) = f_\theta(z_{t'}, t') = z_0, \quad \forall\, t, t'$

with optimization:

$\min_\theta\,\mathbb{E}_{t, z_0}\left\| f_\theta(z_t, t) - z_0 \right\|^2$

The model is trained using augmented blur trajectories, each sample simulating a distinct temporal blur accumulation (e.g., averaging n video frames), enabling robust mapping from any blurred input back to the same target.

4. Kernel ControlNet and Conditional Modulation

The Kernel ControlNet module enhances deblurring by providing explicit blur estimation and controlled conditional guidance:

Kernel Estimation: A convolutional UNet submodule (M) processes the input to predict a spatial blur kernel map $k_t$ .
Conditional Injection: The kernel map is injected into the main UNet backbone in a filter-like manner:
- Intermediate feature $z_\text{in2} = \text{Conv}(z_\text{in1})$
- Concatenated with $k_\text{in}$ : $W = \text{Conv}(\text{Cat}(k_\text{in}, z_\text{in2}))$
- Modulated: $O = W \ast z_\text{in2}$
- Integrated: $z_\text{out} = z_\text{in1} + Z(O)$ (where $Z$ is a convolution initialized to zero)

This approach ensures spatially-adaptive, detail-preserving restoration, leveraging explicit physical knowledge of the blur.

5. Adaptive Timestep Regression

To ensure optimal deblurring across a range of blur severities, FideDiff predicts an adaptive timestep:

$\hat{t} = T\big(M(I_{LQ})\big)$

where $T$ is a regression network and $M$ provides the kernel representation. This enables single-step inference regardless of blur level, outperforming prior fixed-timestep or multi-step diffusion paradigms on both strong (high-t) and light (low-t) blurs.

6. Evaluation Protocols and Performance Metrics

Performance is comprehensively assessed using both traditional and learned perceptual metrics:

PSNR/SSIM: Assess fidelity and structural similarity.
LPIPS/DISTS: Gauge perceptual quality based on learned deep features and structure/texture similarity.

Empirical results demonstrate that FideDiff achieves superior PSNR and SSIM compared with previous diffusion-based methods, and matches or exceeds transformer-based approaches on LPIPS and DISTS. The model’s effectiveness is benchmarked on augmented GoPro (with enriched blur trajectories), HIDE, and RealBlur datasets.

Metric	FideDiff vs. SOTA Diffusion	FideDiff vs. Transformer SOTA
PSNR	Higher	Similar/Competitive
SSIM	Higher	Similar/Competitive
LPIPS	Lower (better)	Lower or equal
DISTS	Lower (better)	Lower or equal

7. Real-World Applications and Resource Release

FideDiff is applicable in consumer and industrial contexts where real-time or accurate deblurring is critical:

Consumer photography and mobile imaging
Automotive or robotics camera feeds
Security and surveillance footage restoration

The method is notably more efficient than multi-step diffusion models, and, with a single-pass design, is amenable to deployment on resource-constrained devices. Training data, including a specifically augmented GoPro set with diverse blur levels, and all source code are released at https://github.com/xyLiu339/FideDiff, supporting reproducibility and further research.

8. Considerations, Limitations, and Future Directions

While FideDiff achieves single-step restoration and strong empirical results, certain challenges persist:

Computational Overhead: Despite single-step operation, the inclusion of Kernel ControlNet and t-regression adds to the inference load relative to plain CNN restoration.
Extremely Heterogeneous Blur: Restoration quality can degrade in cases with dramatic, highly non-uniform spatial blur where estimation is challenging.
Training Data Construction: Effectiveness depends on carefully constructed blur trajectory datasets that match real-world variation.

Future directions include broader generalization to spatially non-stationary blur, further architectural optimizations, and extension to other image restoration domains requiring temporal or physical consistency.

FideDiff establishes a new directional baseline for diffusion-based image restoration, combining trajectory consistency, kernel-guided conditional restoration, and adaptive inference for robust, high-fidelity motion deblurring (Liu et al., 2 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring (2025)

FideDiff: Efficient Motion Deblurring Model

1. Architectural Foundation and System Components

2. Reformulation of Diffusion Process for Motion Deblurring

3. Consistency Training Paradigm

4. Kernel ControlNet and Conditional Modulation

5. Adaptive Timestep Regression

6. Evaluation Protocols and Performance Metrics

7. Real-World Applications and Resource Release

8. Considerations, Limitations, and Future Directions

Whiteboard

Follow Topic

Continue Learning

FideDiff: Efficient Motion Deblurring Model

1. Architectural Foundation and System Components

2. Reformulation of Diffusion Process for Motion Deblurring

3. Consistency Training Paradigm

4. Kernel ControlNet and Conditional Modulation

5. Adaptive Timestep Regression

6. Evaluation Protocols and Performance Metrics

7. Real-World Applications and Resource Release

8. Considerations, Limitations, and Future Directions

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics