Papers
Topics
Authors
Recent
Search
2000 character limit reached

FoundIR-v2: Diffusion-Based Image Restoration

Updated 11 December 2025
  • FoundIR-v2 is a unified diffusion-based model that restores images by dynamically optimizing the mixture of task-specific data.
  • It leverages a Mixture-of-Experts scheduler with a Stable Diffusion XL backbone to adaptively generate priors for tasks like deblurring and super-resolution.
  • The model achieves significant performance improvements in metrics such as PSNR and SSIM across over 50 diverse image restoration sub-tasks.

FoundIR-v2 is a high-capacity diffusion-based image restoration foundation model that leverages dynamic pre-training data mixture optimization and a Mixture-of-Experts (MoE) scheduler to address over 50 sub-tasks such as deblurring, dehazing, denoising, super-resolution, deraining, desnowing, and low-light enhancement within a unified framework. It is predicated on the observation that the proportions of task-specific datasets in the pre-training mixture directly affect multi-task performance, motivating the development of a generalizable architecture that couples diffusion modeling with data equilibrium scheduling for large-scale restoration (Chen et al., 10 Dec 2025).

1. Architectural Design and Objectives

FoundIR-v2 is constructed to serve as an all-in-one restoration foundation model, supporting over 50 diverse sub-tasks. Its design jointly optimizes (i) the mixture ratios of task-specific data via "data equilibrium scheduling" (DES) to prevent task imbalance, and (ii) an MoE-driven diffusion scheduler that provides task-adaptive generative priors in latent space. The architecture integrates a Stable Diffusion XL (SDXL) backbone for latent-space denoising, a frozen VAE encoder-decoder for mapping between image and latent domains, and a learned MoE scheduler applied at each diffusion timestep.

Key components include:

  • VAE Encoder (EVAEE_{\mathrm{VAE}}): Transforms low-quality input ILQI_{\mathrm{LQ}} to latent codes fLQf^{\mathrm{LQ}}, with stochastic resolution alignment for super-resolution tasks.
  • SDXL Denoiser (DθD_\theta): Pre-trained backbone that produces clean latent codes fHQf^{\mathrm{HQ}}.
  • MoE Scheduler: Injected at each diffusion step, selects among nn expert blocks to condition the denoiser on both fLQf^{\mathrm{LQ}} and the noisy latent xtHQx_t^{\mathrm{HQ}}.
  • Data Equilibrium Scheduler: Adjusts task sampling weights {λi}\{\lambda_i\} every TT steps to maintain balanced learning signals across tasks.
  • VAE Decoder (ILQI_{\mathrm{LQ}}0): Maps the final denoised latent code back to RGB space.

This configuration, together with multi-modal (including text) prompts, is intended to enhance the model's ability to generalize across heterogeneous and previously unseen image degradations.

2. Data Equilibrium Scheduling Paradigm

Central to FoundIR-v2 is the DES paradigm, which seeks optimal proportions in the mixture of ILQI_{\mathrm{LQ}}1 task-specific datasets ILQI_{\mathrm{LQ}}2, where the overall sampling distribution is: ILQI_{\mathrm{LQ}}3 The model ILQI_{\mathrm{LQ}}4 is trained to minimize an ILQI_{\mathrm{LQ}}5 reconstruction loss: ILQI_{\mathrm{LQ}}6

Every ILQI_{\mathrm{LQ}}7 training steps, the scheduler evaluates held-out task reference sets to compute score differentials ILQI_{\mathrm{LQ}}8 and updates mixing weights via softmax re-weighting: ILQI_{\mathrm{LQ}}9 where fLQf^{\mathrm{LQ}}0 is a tunable coefficient.

Pseudocode for DES:

DθD_\theta3 This iterative dynamic re-weighting enforces the "Data Mixing Law," which underlies FoundIR-v2's balanced multi-task convergence.

3. MoE-Driven Diffusion Scheduler

At each diffusion timestep fLQf^{\mathrm{LQ}}1 and for sub-task fLQf^{\mathrm{LQ}}2, the MoE scheduler fuses latent codes by concatenation: fLQf^{\mathrm{LQ}}3 where each of fLQf^{\mathrm{LQ}}4 expert blocks fLQf^{\mathrm{LQ}}5 implements specialized attention (spatial, channel, sparse, etc.). The scheduler computes soft-gate weights: fLQf^{\mathrm{LQ}}6 and forms the scheduled feature

fLQf^{\mathrm{LQ}}7

which is passed to the SDXL noise predictor.

Diffusion training follows the denoising-score matching objective: fLQf^{\mathrm{LQ}}8 Training proceeds by first isolating MoE head pre-training (with frozen SDXL) and later joint end-to-end fine-tuning.

4. Training Protocol and Implementation

FoundIR-v2 is trained on a combination of publicly available datasets encompassing 50+ real-world sub-tasks, including but not limited to 4KRD (motion deblur), LSDIR (defocus deblur), PolyU (denoise), Dense-HAZE/NH-HAZE (dehaze), CSTNet HQ-NightRain (derain), UAV-Rain1k (raindrop removal), WeatherBench (desnow), UHD-LL (low-light), DIV2K/Flickr2K/DIV8K (super-resolution), FFHQ (faces), RealPhoto60 (real SR), and RealDeg (old-photo/face restoration). High-quality ground truth filtering is performed using deep multi-modal IQA metrics such as DA-CLIP and DepictQA.

The key hyperparameters are:

  • Hardware: 2× NVIDIA H20 GPUs (96 GB each)
  • Batch size: 32 (random fLQf^{\mathrm{LQ}}9 crops)
  • Optimizer: AdamW with default weight decay
  • Learning rate: VAE encoder DθD_\theta0, others DθD_\theta1 (cosine annealing)
  • Total iterations: 150k; evaluation interval DθD_\theta2k, 10 reference samples per task
  • Diffusion inference: Euler sampler, 20 steps, classifier-free guidance scale = 5; AdaIN color fix for SR tasks

This protocol supports scalable, balanced exposure to the full spectrum of restoration phenomena.

5. Empirical Results, Ablations, and Analysis

FoundIR-v2 achieves leading or near-leading performance on 80% or more of evaluated tasks, using the following metrics:

  • Full-reference: PSNR↑, SSIM↑, LPIPS↓, MUSIQ↑, CLIPIQA+↑
  • No-reference: PIQE↓, MANIQA↑, PaQ-2-PiQ↑

Table: Representative performance (DES vs. static mixing)

Task Staticmix PSNR (dB) DES PSNR (dB) ∆ (dB)
Deblurring 18.91 20.41 +1.50
Dehazing 18.69 19.93 +1.24
Low-light 19.93 20.41 +0.48
SR 18.91 20.09 +1.18

Ablation studies indicate:

  • DES provides +1.2–1.5 dB gain vs. static mixing across tasks.
  • Soft MoE scheduling yields +0.3–0.7 dB vs. single-prior or hard MoE variants.
  • Removing low-quality ground truth increases PSNR by ~0.2–0.4 dB.

Qualitative results demonstrate restoration of sharp edges in motion blur, detail retention in SR, superior handling of heterogeneous murals (outperforming GPT-5 and HYPIR baselines), and effective simultaneous resolution of cascaded tasks such as deraining plus SR. FoundIR-v2 also outperforms pipeline architectures (FoundIR + SUPIR) in joint restoration settings.

Generalization extends to medical imaging domains; limited fine-tuning enables superior recovery of diagnostic structures in laparoscopy and microscopy compared to the prior FoundIR.

6. Significance, Limitations, and Open Directions

FoundIR-v2 establishes the critical importance of dynamic data mixture balancing—formalized as the "Data Mixing Law"—in achieving robust all-in-one restoration. Coupling with an MoE-driven diffusion scheduler enables task-adaptive prior generation in latent space, promoting strong generalization across a diverse sub-task landscape and favorable zero-shot transfer.

Limitations persist with respect to some extreme degradations where task-specialized models outperform the all-in-one approach. Scheduling overhead incurs modest added complexity, and scope remains for lighter-weight MoE variants.

Open directions include:

  • Developing adaptation signals beyond PSNR or MUSIQ for DES updates.
  • Scaling to temporal (video) or multi-modal (e.g., depth, semantics) restoration contexts.
  • Enabling continual learning for incremental addition of new tasks.
  • Investigating compact MoE schedulers for resource-constrained deployment.

FoundIR-v2 offers a scalable foundation for multi-task, real-world restoration, and highlights the role of adaptive data mixing for foundation models in image processing (Chen et al., 10 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FoundIR-v2.