Nested Unfolding Network (NUN) for Concealed Segmentation

Updated 29 November 2025

The NUN framework decouples restoration and segmentation via a nested DUN-in-DUN architecture that iteratively refines both tasks.
It employs a degradation-resistant unfolding network (DeRUN) with vision-language model guidance to adaptively handle arbitrary degradations.
Empirical results show that NUN achieves state-of-the-art accuracy on both clean and degraded benchmarks without relying on preset degradation models.

The Nested Unfolding Network (NUN) is a unified, interpretable architecture designed for real-world concealed object segmentation (COS) under arbitrary and unknown degradations. NUN achieves robust foreground-background separation by decoupling image restoration from segmentation using a novel DUN-in-DUN structure, embedding a degradation-resistant unfolding network (DeRUN) within each stage of a segmentation-oriented unfolding network (SODUN), and introduces bidirectional iterative interactions for mutual refinement. Vision-LLM (VLM) guidance provides degradation semantics without prior specification, and a multi-stage image-quality assessment mechanism ensures adaptive selection of restoration outputs. This framework attains leading performance on both clean and degraded benchmarks (He et al., 22 Nov 2025).

1. Architectural Foundation: DUN-in-DUN Structure

NUN embodies two hierarchically nested deep unfolding networks (DUNs):

The outer Segmentation-Oriented Deep Unfolding Network (SODUN) is composed of $K$ $K$ stages. In each stage $k$ $k$ , SODUN receives:
- The previous stage’s high-quality restoration $X_{k-1}^{T1}$ ,
- Foreground mask $M_{k-1}$ ,
- Background estimate $B_{k-1}$ .

SODUN yields updated estimations $M_k$ (foreground mask) and $B_k$ (background).

The inner Degradation-Resistant Unfolding Network (DeRUN), with $N_k$ iterations per SODUN stage, accepts the same set of inputs in addition to the original degraded observation $Y$ . DeRUN iteratively reconstructs increasingly clean versions $\{X_{k,n}\}_{n=0}^{N_k}$ of the image.

This architecture enforces an explicit separation of the restoration and segmentation tasks. Interactions between the two DUNs are facilitated by the Bi-directional Unfolding Interaction (BUI) mechanism:

After $N_k$ DeRUN iterations, each candidate restoration $X_{k,n}$ is scored using image-quality assessment (IQA), with the highest-scoring $X_k^{T1}$ propagating to the next SODUN stage,
Concurrently, current SODUN outputs $(M_k, B_k)$ are injected as structural priors into DeRUN’s proximal step via a lightweight network $\mathcal X_2$ , focusing restoration efforts on ambiguous regions.

This design ensures that segmentation and restoration are independently optimized while allowing reciprocal refinement across stages.

2. Mathematical Formulation of Iterative Segmentation and Restoration

The NUN’s mechanics are formalized as follows (notation: $Y$ = degraded input, $X$ = ground-truth clean image):

SODUN (Stage $k$ )

Mask update (gradient descent):

$\hat M_k = M_{k-1} - \alpha_M \nabla_M \Bigl(\tfrac{1}{2}\lVert X_{k-1}^{T1} - X_{k-1}^{T1} \odot M_{k-1} - B_{k-1} \rVert_2^2 \Bigr)$

Mask update (proximal step):

$M_k = \mathcal M(\hat M_k,\, X_{k-1}^{T1},\, B_{k-1},\, Y)$

Where $\mathcal M$ utilizes variational structure separation (VSS) with convolutional refinement.

Background update (gradient descent):

$\hat B_k = B_{k-1} - \alpha_B \nabla_B \Bigl(\tfrac{1}{2}\lVert X_{k-1}^{T1} - X_{k-1}^{T1}\odot M_k - B_{k-1} \rVert_2^2 \Bigr)$

Background update (proximal step):

$B_k = \mathcal B(\hat B_k,\, M_k,\, X_{k-1}^{T1},\, Y)$

with $\mathcal B$ implemented via a compact U-Net.

DeRUN (Stage $k$ , Iteration $n$ )

VLM-based degradation inference:

$d_{k,n} = \mathrm{DA\text{-}CLIP}(X_{k,n-1})$

Degradation operator construction:

$RC_{k,n}^D(Z) = \sigma_{k,n} \odot (\mathrm{CRC}(Z) + Z) + \mu_{k,n}$

Restoration (gradient descent):

$\hat X_{k,n} = X_{k,n-1} - \alpha_X RC_{k,n}^{D\,T}(RC_{k,n}^D(X_{k,n-1}) - X_{k-1}^{T1})$

Restoration (proximal step):

$X_{k,n} = \mathcal X_1(\hat X_{k,n}) + \mathcal X_2(B_k, M_k, Y)$

$\mathcal X_2$ leverages $M_k$ , $B_k$ , and $Y$ for targeted enhancement of segmentation-ambiguous regions.

Stage-wise selection: From $\{X_{k,n}\}_{n=1}^{N_k}$ , the best $X_k^{T1}$ is chosen via a composite IQA score combining TOPIQ, Q-Align, and MUSIQ metrics.

3. Vision-LLM Guidance for Degradation Semantics

A vision-LLM (VLM), specifically DA-CLIP, is integrated to infer degradation semantics directly from input images, removing the requirement for pre-defined or explicit priors. Formally, the VLM approximates a posterior over possible degradation types:

$p(d \mid Y) \approx \mathrm{softmax}(E_\mathrm{img}(Y) E_\mathrm{text}(d))$

where $\{d\}$ is a pre-specified vocabulary (e.g., haze, low-light), and $E_\mathrm{img}$ , $E_\mathrm{text}$ are embedding functions for images and text respectively.

The VLM output modulates the unfolding-step operators via $\sigma_{k,n}$ , $\mu_{k,n}$ , which are derived using convolutional transformations on the DA-CLIP embedding $d_{k,n}$ . This enables DeRUN to adapt restoration strategies dynamically to varying types of degradation encountered in real-world imagery.

4. Loss Functions, Optimization Strategy, and Consistency

NUN applies comprehensive supervision across all stages for both restoration and segmentation, introducing cross-stage regularization for stability:

Restoration-fidelity loss:

$L_\mathrm{rest} = \sum_{k=1}^K \frac{1}{2^{K-k}} \lVert X_k^{T1} - X \rVert_2^2$

Segmentation loss: Weighted sum of binary cross-entropy (BCE) and intersection-over-union (IoU):

$L_\mathrm{seg} = \sum_{k=1}^K \frac{1}{2^{K-k}} [L^w_\mathrm{BCE}(M_k,GT_s) + L^w_\mathrm{IoU}(M_k,GT_s)]$

Cross-stage consistency loss: Mask stability under alternate restoration:

$L_\mathrm{csc} = \sum_{k=1}^K \frac{1}{2^{K-k}} [L^w_\mathrm{BCE}(M_k,M_k^{T2}) + L^w_\mathrm{IoU}(M_k,M_k^{T2})]$

Total training objective:

$L_\mathrm{total} = L_\mathrm{rest} + L_\mathrm{seg} + \epsilon\, L_\mathrm{csc}$

Weights $\epsilon$ and $L^w$ are hyperparameters controlling regularization.

Image-quality assessment in each DeRUN stage ensures only the highest-IQA restoration is propagated, and the cross-stage consistency loss promotes mask robustness to subtle changes in restorations.

The BUI mechanism is central to NUN’s iterative, interpretable refinement loop:

Segmentation-to-Restoration: DeRUN’s proximal network $\mathcal X_2$ utilizes SODUN’s current $(M_k, B_k, Y)$ outputs to prioritize restoration in regions where segmentation is ambiguous.
Restoration-to-Segmentation: SODUN, rather than operating directly on the raw degraded $Y$ , uses the progressively restored $X^{T1}$ generated by DeRUN, ensuring accurate gradient estimates during mask and background separation.

This exchange is realized mathematically in the update rules for both networks, maintaining decoupling of optimization objectives while enabling synergistic improvement for both tasks across multiple stages.

6. Performance, Benchmarking, and Significance

Extensive empirical evaluation demonstrates that NUN attains leading segmentation accuracy on both clean and degraded test sets for concealed object segmentation. The mechanism of selecting the best restoration output via IQA and enforcing cross-stage segmentation mask stability by self-consistency loss yields robustness against a broad spectrum of real-world degradation scenarios, without reliance on pre-defined degradation models (He et al., 22 Nov 2025).

This suggests NUN’s architectural paradigm—alternating, decoupled unfolding with reciprocal guidance—may generalize to other tasks where restoration and semantic estimation have conflicting or complementary objectives. A plausible implication is applicability in domains such as biomedical imaging and remote sensing where similar degradation-agnostic strategies are advantageous.

7. Interpretability and Future Implications

The iterative, stage-wise nature of NUN guarantees interpretability, as each sub-network’s actions and information flow are transparent by design. The explicit decoupling of restoration and segmentation prevents conflicting learning signals, and the vision-language interface ensures continual adaptation to unknown and variable degradation phenomena.

Given its robust empirical performance and principled bidirectional refinement schema, further exploration of DUN-in-DUN architectures is warranted in tasks involving joint low-level and high-level vision. Open research directions include applying similar nested unfolding paradigms to other structured optimization tasks, expanding the set of guided priors via more advanced VLMs, and formal analysis of convergence and interpretability guarantees under diverse degradation conditions.

PDF Markdown Chat (Pro)

References (1)

Nested Unfolding Network for Real-World Concealed Object Segmentation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Nested Unfolding Network (NUN).

Nested Unfolding Network (NUN) for Concealed Segmentation

1. Architectural Foundation: DUN-in-DUN Structure

2. Mathematical Formulation of Iterative Segmentation and Restoration

SODUN (Stage $k$ )

DeRUN (Stage $k$ , Iteration $n$ )

3. Vision-LLM Guidance for Degradation Semantics

4. Loss Functions, Optimization Strategy, and Consistency

5. Bi-Directional Feature Exchange and Cross-Stage Refinement

6. Performance, Benchmarking, and Significance

7. Interpretability and Future Implications

Whiteboard

Follow Topic

Continue Learning

Nested Unfolding Network (NUN) for Concealed Segmentation

1. Architectural Foundation: DUN-in-DUN Structure

2. Mathematical Formulation of Iterative Segmentation and Restoration

SODUN (Stage kkk)

DeRUN (Stage kkk, Iteration nnn)

3. Vision-LLM Guidance for Degradation Semantics

4. Loss Functions, Optimization Strategy, and Consistency

5. Bi-Directional Feature Exchange and Cross-Stage Refinement

6. Performance, Benchmarking, and Significance

7. Interpretability and Future Implications

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics

SODUN (Stage $k$ )

DeRUN (Stage $k$ , Iteration $n$ )