Degradation-Resistant Unfolding Network (DeRUN)

Updated 29 November 2025

DeRUN is a deep unfolding network that robustly restores and segments images under unknown or mixed degradations.
It uses dynamic, vision–language guided iterative updates to replace fixed degradation models with data-driven operators.
Empirical results demonstrate significant gains in metrics like F-measure, PSNR, and SSIM across diverse degradation benchmarks.

A Degradation-Resistant Unfolding Network (DeRUN) is a class of deep unfolding networks (DUNs) designed to robustly address real-world image restoration and segmentation tasks in the presence of unknown or mixed degradations, such as haze, rain, blur, or noise. DeRUN achieves this by dynamically adapting its iterative updates to degradation characteristics inferred in a data-driven manner—often using prompts from vision–LLMs—rather than relying on explicit or handcrafted degradation priors. The architecture is central to both advanced segmentation frameworks such as the Nested Unfolding Network (NUN) for concealed object segmentation (He et al., 22 Nov 2025), and to general image restoration approaches within the InterIR framework (Gao et al., 13 Nov 2025).

1. Architectural Foundations

DeRUN architectures are fundamentally based on the principle of deep unfolding, in which a classical iterative optimization procedure is converted into a network of learnable layers. A salient instantiation appears in NUN, where DeRUN is nested as an inner loop inside each stage of the overall segmentation-oriented unfolding process. The top-level structure is summarized as follows:

Component	Outer Loop (SODUN)	Inner Loop (DeRUN)
Iterations	$k = 1, ..., K$ stages	$n = 1, ..., N_k$ per stage
Primary Operation	Mask/background refinement	Image restoration under unknown degradation
Guidance	DeRUN feeds next SODUN in	Guided by SODUN outputs and VLM prompt

For each stage $k$ , SODUN consumes either the input image (first stage) or a selected high-quality restoration $X_{k-1}^{T1}$ from the previous stage. It maintains arrays $M_k$ and $B_k$ corresponding to the segmentation mask and background. DeRUN, embedded within, iteratively restores the input image under dynamic, VLM-derived degradation descriptors, and is mutually guided by $M_k$ and $B_k$ from SODUN.

This nested architecture enables decoupling image restoration from semantic segmentation, a property empirically shown to mitigate gradient conflicts and increase robustness under ambiguous degradations (He et al., 22 Nov 2025).

2. Mathematical and Optimization Formulation

Restoration in DeRUN is formulated as an unconstrained minimization problem under unknown or compound degradations:

$L(X) = \frac{1}{2}\lVert Y - D X\rVert_2^2 + \gamma \, \omega(X)$

where $Y$ is the observed degraded image, $D$ is an unknown (potentially composite) degradation operator, and $\omega(X)$ is a regularizer.

Within DeRUN, each iteration substitutes the explicit $D$ and $D^T$ with learnable, data-driven operators $RC^D_{k,n}$ and $RC^{D^T}_{k,n}$ dynamically conditioned on VLM embeddings $d_{k,n}$ :

Gradient descent step: $\hat Y_{k,n} = X_{k,n-1} - \alpha_X \cdot RC^{D^T}_{k,n} (RC^D_{k,n}(X_{k,n-1}) - X_{k-1})$
Proximal/denoiser step (segmentation guided): $X_{k,n} = X_1(\hat Y_{k,n}) + X_2(B_k, M_k, Y)$ where $X_1$ and $X_2$ are lightweight neural modules, with $X_2$ injecting structural cues from SODUN's segmentation outputs.

An analogous formulation underpins InterIR (Gao et al., 13 Nov 2025), which employs a block coordinate semi-smooth Newton method, Kronecker-factorized degradation operators, and a cascade of physicochemically interpretable modules, each embedding explainable and adaptive convolutions.

3. Dynamic Degradation Representation

DeRUN eliminates reliance on fixed degradation models by inferring degradation semantics directly from the input. At each iteration, a vision–LLM (VLM), specifically a DA-CLIP module, processes the intermediate restored image to produce a low-dimensional vector $d_{k,n}$ . This vector parametrizes the construction of residual convolutional proxies $RC^D_{k,n}, RC^{D^T}_{k,n}$ :

$RC^D_{k,n}(X) = \sigma_{k,n} \odot (CRC(X) + X) + \mu_{k,n}$

with $\sigma_{k,n}, \mu_{k,n}$ learned from $d_{k,n}$ .

This mechanism allows per-iteration specialization to the current image's degradation, supporting arbitrary, unknown, or mixed degradation types without any human-specified prior such as haze matrices or gamma correction curves (He et al., 22 Nov 2025). The physically interpretable InterIR framework provides convergent updates for both image and operator estimates and augments them with explainable convolution (EPBlock) modules, enabling adaptivity and interpretability at all restoration stages (Gao et al., 13 Nov 2025).

4. Integration, Training, and Loss Functions

DeRUN instantiations within NUN and InterIR are trained using multi-stage supervision and consistency-enforcing objectives. In NUN, the full loss for stage $k$ is:

$L_{\text{basic}} = \sum_{k=1}^K 2^{-(K-k)} \left[ L^w_{BCE}(M_k, \text{GT}_s) + L^w_{IoU}(M_k, \text{GT}_s) + \lVert X_k - X \rVert_2^2 \right]$

A cross-stage consistency loss is imposed using the top two candidate restorations at every stage: $L_{\text{csc}} = \sum_{k=1}^K 2^{-(K-k)} [ L^w_{BCE}(M_k, M_k^{T2}) + L^w_{IoU}(M_k, M_k^{T2}) ]$ The total objective is $L_\text{total} = L_{\text{basic}} + \epsilon L_{\text{csc}}$ .

Image quality assessment (IQA) modules (TOPIQ, Q-Align, MUSIQ) guide the selection of optimal restorations for feedback and consistency without supervision. Optimization in both NUN and InterIR employs Adam, image-wise patch training, and staged learning rate decay; InterIR further includes Fourier-domain perceptual loss terms (Gao et al., 13 Nov 2025).

5. Empirical Performance and Ablation

Experimental results across both NUN and InterIR consistently demonstrate DeRUN’s resilience to unknown and compound degradations:

On synthetic multi-degradation benchmarks (e.g., COD10K), models with DeRUN exhibit a $\sim$ 20% improvement in $F_\beta^{\max}$ and $S_\alpha$ over plain RUN (He et al., 22 Nov 2025).
On real low-quality datasets (PCOD-LQ, MCOD-LQ), NUN/DeRUN achieves state-of-the-art mean IoU and $F_\beta$ .
For general multi-degradation image restoration, InterIR reaches PSNR 32.51 and SSIM 0.927 (multi-type average), outperforming single-degradation-focused baselines (Gao et al., 13 Nov 2025).

Ablation shows that removing DeRUN or its VLM-driven operators leads to substantial performance drops (over 15% in F-measure), and that omitting IQA-driven selection or segmentation-driven refinement results in measurable degradation (2–4%) in performance metrics.

6. Interpretability and Explainability

A distinguishing attribute of DeRUN (especially within InterIR) is its emphasis on interpretability—maintaining a clear connection between the optimization variables and the underlying restoration process. Each latent variable and update can be traced to its physical or semantic counterpart, and the explainable convolution modules (EPBlocks) ensure that adaptivity does not come at the expense of transparency. These modules generate image- and sample-specific receptive fields via binary masking and soft attention, while sharing a global convolution kernel, enabling both flexibility and analytical tractability (Gao et al., 13 Nov 2025).

7. Significance, Open Questions, and Future Directions

Degradation-Resistant Unfolding Networks represent a technical advance in tackling real-world image understanding problems where degradation is unknown, nonuniform, or mixed, notably for applications in object segmentation, surveillance, and uncontrolled imaging environments. Their empirical success is underpinned by (1) the decoupling of restoration and semantic inference, (2) dynamic per-iteration adaptation to image conditions, and (3) feedback- and consistency-driven selection mechanisms. Outstanding questions include further improvement of degradation prompt extraction, the applicability of DeRUNs beyond vision (e.g., in medical imaging), and the principled extension of explainable unfolding to multi-modal or video settings. The approach illustrates a general trend towards optimization-inspired, interpretable, and dynamically guided architectures in high-stakes perception and restoration tasks (He et al., 22 Nov 2025, Gao et al., 13 Nov 2025).