Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 137 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 116 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

FideDiff: Efficient Motion Deblurring Model

Updated 4 October 2025
  • FideDiff is a diffusion framework for efficient motion deblurring, using latent diffusion, kernel control, and adaptive timestep selection for single-step restoration.
  • It reformulates blur as a time-dependent diffusion process, ensuring consistent mapping from any blurred input to a clear image.
  • Benchmark results show superior performance on PSNR, SSIM, LPIPS, and DISTS metrics across datasets like GoPro, HIDE, and RealBlur, supporting diverse real-world applications.

FideDiff is a diffusion model framework designed for efficient, high-fidelity image motion deblurring. It combines a single-step diffusion paradigm, temporal consistency objectives, explicit blur kernel control, and adaptive timestep selection to achieve superior performance on standard and perceptual metrics. The model is constructed upon large-scale pretrained diffusion backbones with modifications tailored to the motion deblurring problem, and is benchmarked extensively on widely used datasets with public code and data resources.

1. Architectural Foundation and System Components

FideDiff employs a latent diffusion backbone built upon a pretrained Stable Diffusion 2.1 model and incorporates several custom modifications specific to deblurring:

  • VAE Backbone: The input low-quality (blurry) image is encoded into the latent space using a Variational Autoencoder (VAE) with reduced downsampling (d = 4 instead of the standard d = 8) to minimize spatial detail loss, thus improving restoration fidelity.
  • Diffusion Backbone: The model repurposes the standard diffusion process to operate as a motion blur trajectory rather than traditional Gaussian noise, with each diffusion timestep representing a discrete blur severity level.
  • Consistency Model: Under the consistency paradigm, the diffusion network is trained such that for any blurry latent input zₜ (sampled at timestep t) along the blur trajectory, the network predicts the same clean image latent z₀. This enables accurate single-step (t=1) deblurring at inference.
  • Kernel ControlNet: FideDiff integrates a Kernel ControlNet module within the UNet backbone, utilizing a kernel estimation subnetwork to condition the restoration on an explicit, spatially-varying blur kernel representation.
  • Adaptive Timestep Prediction: A small regression network predicts the “optimal” diffusion timestep t̂ for inference, conditioned on the estimated blur kernel. This allows the model to dynamically adapt to varying degrees and patterns of input motion blur.

2. Reformulation of Diffusion Process for Motion Deblurring

Motion deblurring is reframed as a diffusion-like Markov process, where, instead of simple Gaussian noise progression, “blurring” (via a kernel) is treated as the corruption operator:

zt=ktz0=αˉtz0+1αˉtϵ^z_t = k_t * z_0 = \sqrt{\bar{\alpha}_t}\,z_0 + \sqrt{1-\bar{\alpha}_t}\,\hat{\epsilon}

Here, ktk_t is the time-dependent blur kernel, αˉt\bar{\alpha}_t encodes diffusion progression, and ϵ^\hat{\epsilon} is the residual noise/error. Each tt corresponds to a specific blur severity on a trajectory generated by repeated averaging of video frames (as in GoPro), explicitly connecting physical blur to the diffusion formalism.

3. Consistency Training Paradigm

Rather than the traditional noisy-to-clean multi-step denoising, FideDiff is trained via a trajectory-wide consistency objective. The loss enforces:

fθ(zt,t)=fθ(zt,t)=z0,t,tf_\theta(z_t, t) = f_\theta(z_{t'}, t') = z_0, \quad \forall\, t, t'

with optimization:

minθEt,z0fθ(zt,t)z02\min_\theta\,\mathbb{E}_{t, z_0}\left\| f_\theta(z_t, t) - z_0 \right\|^2

The model is trained using augmented blur trajectories, each sample simulating a distinct temporal blur accumulation (e.g., averaging n video frames), enabling robust mapping from any blurred input back to the same target.

4. Kernel ControlNet and Conditional Modulation

The Kernel ControlNet module enhances deblurring by providing explicit blur estimation and controlled conditional guidance:

  1. Kernel Estimation: A convolutional UNet submodule (M) processes the input to predict a spatial blur kernel map ktk_t.
  2. Conditional Injection: The kernel map is injected into the main UNet backbone in a filter-like manner:
    • Intermediate feature zin2=Conv(zin1)z_\text{in2} = \text{Conv}(z_\text{in1})
    • Concatenated with kink_\text{in}: W=Conv(Cat(kin,zin2))W = \text{Conv}(\text{Cat}(k_\text{in}, z_\text{in2}))
    • Modulated: O=Wzin2O = W \ast z_\text{in2}
    • Integrated: zout=zin1+Z(O)z_\text{out} = z_\text{in1} + Z(O) (where ZZ is a convolution initialized to zero)

This approach ensures spatially-adaptive, detail-preserving restoration, leveraging explicit physical knowledge of the blur.

5. Adaptive Timestep Regression

To ensure optimal deblurring across a range of blur severities, FideDiff predicts an adaptive timestep:

t^=T(M(ILQ))\hat{t} = T\big(M(I_{LQ})\big)

where TT is a regression network and MM provides the kernel representation. This enables single-step inference regardless of blur level, outperforming prior fixed-timestep or multi-step diffusion paradigms on both strong (high-t) and light (low-t) blurs.

6. Evaluation Protocols and Performance Metrics

Performance is comprehensively assessed using both traditional and learned perceptual metrics:

  • PSNR/SSIM: Assess fidelity and structural similarity.
  • LPIPS/DISTS: Gauge perceptual quality based on learned deep features and structure/texture similarity.

Empirical results demonstrate that FideDiff achieves superior PSNR and SSIM compared with previous diffusion-based methods, and matches or exceeds transformer-based approaches on LPIPS and DISTS. The model’s effectiveness is benchmarked on augmented GoPro (with enriched blur trajectories), HIDE, and RealBlur datasets.

Metric FideDiff vs. SOTA Diffusion FideDiff vs. Transformer SOTA
PSNR Higher Similar/Competitive
SSIM Higher Similar/Competitive
LPIPS Lower (better) Lower or equal
DISTS Lower (better) Lower or equal

7. Real-World Applications and Resource Release

FideDiff is applicable in consumer and industrial contexts where real-time or accurate deblurring is critical:

  • Consumer photography and mobile imaging
  • Automotive or robotics camera feeds
  • Security and surveillance footage restoration

The method is notably more efficient than multi-step diffusion models, and, with a single-pass design, is amenable to deployment on resource-constrained devices. Training data, including a specifically augmented GoPro set with diverse blur levels, and all source code are released at https://github.com/xyLiu339/FideDiff, supporting reproducibility and further research.

8. Considerations, Limitations, and Future Directions

While FideDiff achieves single-step restoration and strong empirical results, certain challenges persist:

  • Computational Overhead: Despite single-step operation, the inclusion of Kernel ControlNet and t-regression adds to the inference load relative to plain CNN restoration.
  • Extremely Heterogeneous Blur: Restoration quality can degrade in cases with dramatic, highly non-uniform spatial blur where estimation is challenging.
  • Training Data Construction: Effectiveness depends on carefully constructed blur trajectory datasets that match real-world variation.

Future directions include broader generalization to spatially non-stationary blur, further architectural optimizations, and extension to other image restoration domains requiring temporal or physical consistency.


FideDiff establishes a new directional baseline for diffusion-based image restoration, combining trajectory consistency, kernel-guided conditional restoration, and adaptive inference for robust, high-fidelity motion deblurring (Liu et al., 2 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to FideDiff.