Unrolled Denoising Networks

Updated 17 March 2026

Unrolled denoising networks are deep neural architectures that mimic the iterative steps of classical optimization to integrate data fidelity with learned regularizers.
They translate methods like ISTA, ADMM, and proximal gradient into fixed network layers, enhancing interpretability and parameter efficiency.
These networks achieve state-of-the-art performance in applications such as medical imaging, seismic data, and CT scans while offering theoretical guarantees under certain conditions.

Unrolled denoising networks are a class of neural architectures derived by mapping the iterations of classical optimization algorithms for signal or image denoising directly onto the layers of a deep network. Each layer mimics a single step of an underlying iterative solver for a regularized inverse problem, with selected steps or parameters made trainable and, frequently, parameterized by CNNs or other neural modules. This approach yields interpretable architectures that combine data-fidelity, explicit or learned priors, and often constraints or physics, leading to powerful and robust denoisers with strong theoretical and empirical guarantees.

1. Foundations and Motivation

Unrolled denoising networks are rooted in the formulation of denoising as a regularized inverse problem. The canonical setting observes a noisy signal or image $y = x + n$ , with $x$ the unknown clean signal and $n$ an (often Gaussian) noise process. The usual goal is to recover $x$ via a MAP or variational estimator: $x^* = \arg\min_x \|y - x\|_2^2 + \mathcal{R}(x)$ where $\mathcal{R}(x)$ regularizes $x$ toward natural or desired structure, e.g., through $\ell_1$ sparsity, total variation, or deep priors (Diamond et al., 2017).

Traditional iterative optimization (e.g., proximal-grad, ISTA, ADMM) solves such variational forms via alternating updates—projecting onto the data-fidelity term and applying a regularizer’s proximal operator—often over tens to hundreds of steps. Unrolled denoising networks fix the number of steps (layers), replace hand-crafted regularizers or solvers with learned trainable modules, and train end-to-end for a targeted objective, while the architecture and information flow still reflect the underlying optimization algorithm (Yaman et al., 2020, Janjušević et al., 2021, Le et al., 2023).

Key advantages include interpretability (each layer corresponds to an explicit algorithmic operation), parameter efficiency, the ability to impose domain or physics priors, and—in certain cases—provable optimality properties in the large-sample limit (Karan et al., 2024).

2. Algorithmic Derivation and Neural Mapping

The mapping from iterative solver to network architecture is algorithm specific. For example, in "Unrolled Optimization with Deep Priors" (Diamond et al., 2017), denoising is formulated as a composite of a data-fidelity term and a learned prior, resulting in an unrolled proximal-gradient architecture: $x^{(k+\frac{1}{2})} = \text{CNN}(x^{(k)}; \theta^{(k)})$

$x^{(k+1)} = \frac{x^{(k)} + x^{(k+\frac{1}{2})} + (\alpha_k/\sigma^2)y}{1 + \alpha_k/\sigma^2}$

Here, CNN serves as the learned proximal operator for the prior and the closed-form update implements the data-prox step.

Other notable derivations:

Alternating minimization for inpainting/denoising: Noise2Inpaint (Yaman et al., 2020) recasts self-supervised denoising with masking as a regularized inpainting problem, alternates between data-fidelity (an explicit convex combination update) and CNN-based regularization, and unrolls these updates into a fixed-depth network.
ISTA/ADMM-driven architectures: CDLNet (Janjušević et al., 2021) and various proximal neural networks (Le et al., 2023) implement unrollings of ISTA, Chambolle–Pock, or other primal/dual methods, training filterbanks and per-iteration thresholds and optionally incorporating inertia (skip connections).
Plug-and-play/RED: Integration of explicit denoising operators as algorithmic priors, for instance in DURED-Net (Huang et al., 2022), allows RED or plug-and-play iterations to be mapped directly with learned CNN denoisers in the unrolled layers.

Unrolling can also integrate non-local priors, graph-based structure, or domain constraints—see DU-BM3D (Basim et al., 15 Nov 2025), graph unrolled networks (Chen et al., 2020, Kojima et al., 28 May 2025), and Gabor-based physical constraints (Liu et al., 2023).

3. Architectural Families and Variants

A taxonomy of prominent unrolled denoising architectures includes:

Architecture / Paper	Optimization Unrolled	Learned Module(s)
Noise2Inpaint (Yaman et al., 2020)	Alternating Minimization	Small U-Net/ResNet regularizer
CDLNet (Janjušević et al., 2021)	ISTA	Multi-channel convolutions, thresholds
ODP (Diamond et al., 2017)	Proximal Gradient	Residual CNN step per iteration
DURED-Net (Huang et al., 2022)	RED-ADMM	Denoiser CNN predicting RED residual
Proximal Neural Networks (Le et al., 2023)	Chambolle–Pock, DFB/ADMM	Filterbanks, stepsizes, thresholds
DU-BM3D (Basim et al., 15 Nov 2025)	BM3D (single pass)	U-Net collaborative filter
Graph Unrolling (Chen et al., 2020)	Sparse coding/trend filtering	GNN/Edge-weighted convolutions
Gabor-based (Liu et al., 2023)	ISTA	Gabor-param kernels per layer
UPnP-GGLR (Cai et al., 2024)	ADMM for quadratic programming	CG-solved data and prior splits
MGSD-LLap-DAU (Kojima et al., 28 May 2025)	Denoising/graph learning	Layer-specific regularization params

Architectural choices depend on the problem structure. Denoisers for images, MRI, or seismic data commonly use U-Net-based modules as learned priors. For graph-structured data, edge-weight sharing and kernelized convolutions matched to the graph topology are central. Models targeting robustness incorporate constraints on Lipschitz constants or normalization of filterbanks (Le et al., 2023).

4. Self-Supervision, Regularization, and Robustness

A critical enabling development for unrolled denoising networks is the integration of self-supervision and referenceless loss functions, which eliminate the need for clean signal pairs:

Mask-based self-supervision: In approaches like Noise2Inpaint (Yaman et al., 2020), pixels are masked into disjoint sets; the model predicts masked values from unmasked data, optimizing a loss only on held-out entries.
Noise2Noise and variants: Models such as DURED-Net (Huang et al., 2022) use two independently noised measurements as input/target, with loss on their difference, leveraging the statistical properties of the noise process.
Adaptive regularization: CDLNet (Janjušević et al., 2021) adds noise-adaptive thresholding, inferring thresholds proportional to test-time noise statistics. Gabor-based denoising networks (Liu et al., 2023) impose constraints on physical filter parameters, ensuring the learned denoiser aligns with the process to be mitigated.

Robustness is further quantified via Lipschitz analysis of the network mapping (Le et al., 2023), with architectures enforcing non-expansive constraints displaying improved transfer to out-of-distribution noise and stability under adversarial input shift.

5. Theoretical Guarantees and Optimality

Recent work provides the first rigorous proofs that unrolled denoising architectures can match Bayes-optimal inference in high-dimensional settings:

"Unrolled denoising networks provably learn optimal Bayesian inference" (Karan et al., 2024) shows that, for compressed sensing and rank-one estimation, layerwise training of unrolled AMP networks with shallow MLP denoisers achieves the same MSE and recovers the same posterior mean denoisers as Bayes AMP, assuming product priors and sufficient width.
State evolution and Onsager correction in AMP-based networks play a central role in this guarantee. Under mild regularity conditions, empirical error of each layer tracks the predicted state evolution, and gradient descent over polynomially many samples finds the optimal scalar denoiser.

These results clarify the conditions under which unrolled architectures, when optimally trained, subsume traditional Bayes-optimal iterative estimators, and can even outperform them in non-product or low-dimensional regimes.

6. Applications and Performance

Unrolled denoising networks have demonstrated strong results across diverse domains:

Natural image denoising: Architectures like ODP (Diamond et al., 2017), CDLNet (Janjušević et al., 2021), and Noise2Inpaint (Yaman et al., 2020) reach or exceed state-of-the-art PSNR/SSIM on BSD68, ImageNet, and other benchmarks, often with orders-of-magnitude fewer parameters than generic CNNs.
Medical imaging: DURED-Net (Huang et al., 2022) outperforms self-supervised and supervised baselines in MRI reconstruction (fastMRI), reaching within 0.3–0.5 dB of fully supervised unrolled schemes without ever seeing clean data.
Seismic denoising: Gabor-based unrolled networks (Liu et al., 2023) operate in a fully self-supervised regime, provide interpretable filter parameters, and achieve high PSNR/SSIM on both synthetic and field data.
CT imaging: Deep Unfolded BM3D (Basim et al., 15 Nov 2025), combining fixed non-local grouping with a learnable collaborative filter, delivers superior PSNR, SSIM, and robustness to noise-level mismatch compared to conventional BM3D and plain U-Nets.
Graph denoising: Graph Unrolling Networks (Chen et al., 2020) and MGSD-LLap-DAU (Kojima et al., 28 May 2025) leverage the unrolling of optimization over graph domains, yielding significant reductions (20–60%) in error relative to spectral or GNN baselines on both simulated and real complex-structured graph datasets.

Performance trends consistently show that unrolled frameworks strike an optimal trade-off between interpretability, data-efficiency, and generalization to varying noise or measurement statistics.

7. Interpretability, Limitations, and Extensions

Interpretability is a core advantage: architectural operations and parameters correspond directly to steps in well-studied optimization algorithms, and layerwise outputs can be attributed to distinct inference or prior mechanisms. Adaptive thresholds, filterbanks, and regularization strengths admit physical, statistical, or domain-theoretic interpretation (Janjušević et al., 2021, Cai et al., 2024, Liu et al., 2023).

Current limitations include:

Depth and expressivity-bottleneck: For some tasks, shallow unrollings suffice, but in highly non-linear or data-rich regimes, fixed-depth unrolling may limit representation power compared to unrestricted deep CNNs.
Non-convexity and stability: While unrolling is often inspired by convex optimization, trainable modules, non-convex regularizers (e.g., NC-GTV (Wei et al., 3 Jun 2025)), and learned step sizes can introduce non-convexities, risking suboptimal fixed points.
Extension to novel domains: Non-local, physics-informed, or graph-structured variants extend unrolling to highly general settings, but the transferability of certain guarantees or empirical optimality remains under explored for all modalities.

Potential extensions focus on unrolling more sophisticated iterative methods, hybridizing with self-attention/transformer mechanisms (Wang et al., 4 Jun 2025), combining physics-driven and data-driven modules, and certifying robustness and generalization even under strong covariate or noise-model shift (Cai et al., 2024).

Unrolled denoising networks represent a rigorous, adaptive, and interpretable synthesis of optimization theory and deep learning, offering robust, efficient, and often provably optimal solutions to classical and emerging signal and image denoising tasks (Yaman et al., 2020, Diamond et al., 2017, Karan et al., 2024, Basim et al., 15 Nov 2025, Janjušević et al., 2021, Le et al., 2023).