LatentUnfold: Unified Blind Image Restoration
- LatentUnfold is a unified framework for blind image restoration that integrates interpretable optimization, nonparametric degradation modeling, and conditional latent diffusion priors.
- It employs a multi-stage architecture featuring a Multi-Granularity Degradation-Aware module, a Degradation-Resistant Latent Diffusion Model, and an Over-Smoothing Correction Transformer.
- Experimental results across benchmarks like SIDD, GoPro, and LOL-v2 demonstrate state-of-the-art performance in PSNR and SSIM, highlighting its robustness and efficacy.
LatentUnfold (formally UnfoldLDM) is a unified deep unfolding network for blind image restoration (BIR), integrating interpretable optimization principles, nonparametric degradation modeling, and conditional latent diffusion priors within a multi-stage architecture. Developed to address the dual limitations of degradation-specific dependency and over-smoothing bias inherent in classical Deep Unfolding Networks (DUNs), LatentUnfold introduces a three-part pipeline: Multi-Granularity Degradation-Aware modeling, Degradation-Resistant Latent Diffusion Priors, and Over-Smoothing Correction Transformers. This plug-and-play structure achieves state-of-the-art results across a wide spectrum of BIR tasks by jointly estimating the unknown degradation and restoring structured, high-frequency image details (He et al., 22 Nov 2025).
1. Blind Restoration Optimization Formulation
The blind restoration model observes a degraded image generated via an unknown linear process:
where is the latent clean image, is an unknown degradation matrix, and is additive noise. To capture structure and reduce complexity, is factorized as a Kronecker product:
with and . The energy minimization for restoration is
where is a learned image prior and weights regularization. The problem is solved via -stage proximal-gradient unfolding, applying block-coordinate descent over fidelity terms and followed by a learned proximal operator. Specifically, at each stage :
2. Multi-Granularity Degradation-Aware (MGDA) Module
MGDA replaces analytic gradients with data-driven surrogates, enabling end-to-end handling of unknown degradations:
- Holistic Degradation: Two Siamese Visual State Space (VSS) networks (, ) estimate and its transpose, producing
- Structured Decomposition: Neural blocks and alternately estimate components and , constructed via normalized outputs from concatenated feature maps. The structured fidelity update is
An intra-stage consistency loss
promotes alignment between holistic and structural branches.
3. Degradation-Resistant Latent Diffusion Model (DR-LDM)
The proximal operator in LatentUnfold is realized by a conditional latent diffusion model designed for degradation invariance:
- Latent Prior Extraction: In Phase I, a Prior Inference (PI) network maps to a compact latent prior .
- Diffusion Forward: For steps,
with , .
- Diffusion Reverse: A denoising network predicts noise given noisy latent prior and a conditioning vector . The recursion is:
After T steps, the sampled prior is passed to the detail recovery module.
4. Over-Smoothing Correction Transformer (OCFormer)
OCFormer is a U-shaped network that fuses intermediate results and the diffusion posterior to restore high-frequency textures:
- Degradation-Resistant Attention (DRA): Features from are enriched by learning self-attention weights through mixed and depthwise convolutions:
- Prior-Guided Detail Recovery (PDR): The prior modulates normalized features:
The final output is generated by the U-net decoder.
5. Unified Iterative Restoration Algorithm
The end-to-end unfolding procedure, as summarized in the provided pseudocode, executes stages. Each stage alternates between MGDA steps to estimate both holistic and structured degradations, then applies DR-LDM to sample a latent prior, and finally invokes OCFormer for refined reconstruction. This approach is designed as plug-and-play; it can be integrated as a wrapper for existing DUN-based methods.
6. Training Procedures and Loss Functions
The training is phased:
- Phase I: Pretrain PI and OCFormer with
where .
- Phase II: Train DR-LDM and fine-tune the entire framework with
, in practice.
7. Experimental Results and Interpretation
Implemented in PyTorch on NVIDIA H200 (K=3 stages, T=3, ), LatentUnfold achieves the following on standard BIR benchmarks:
- Blind denoising: SIDD (PSNR 40.02 dB, SSIM 0.961), DND (40.06 dB, 0.958)
- Blind deblurring: GoPro (34.32 dB, 0.970), HIDE (31.85 dB, 0.948)
- Underwater: UIEB (24.70 dB, 0.947)
- Backlit: BAID (24.97 dB, 0.910)
- Low-light: LOL-v2 real (23.58 dB, 0.886); synthetic (27.92 dB, 0.957)
- Deraining: Five benchmarks, average PSNR ~39.5 dB, SSIM ~0.98
Across all cases, UnfoldLDM establishes new state-of-the-art results for blind restoration (He et al., 22 Nov 2025).
8. Significance of Latent Diffusion Priors in Blind Restoration
Standard DUNs exhibit a low-frequency bias due to the dominance of smooth components in gradient-driven updates, especially under severe or unknown degradations, leading to oversmoothing. The latent diffusion prior in DR-LDM is explicitly trained for degradation invariance, promoting generative recovery of natural high-frequency textures. Conditioning the diffusion prior on MGDA’s estimates prevents reintroduction of degraded patterns, while the bidirectional interplay—cleaner inputs aiding prior learning, and stronger priors enhancing restoration—enables sharp, artifact-free outputs for a wide variety of degradations.
In summary, LatentUnfold (UnfoldLDM) represents an advance in blind image restoration by combining interpretable model-based unfolding, neural degradation modeling, and strong generative priors, implemented in a modular, extensible framework (He et al., 22 Nov 2025).