Deep Unfolded BM3D for Image Denoising

Updated 22 November 2025

The paper introduces DU-BM3D, a hybrid denoising model that unrolls the BM3D framework by replacing the collaborative filtering with a trainable U-Net, leading to improved image quality.
DU-BM3D leverages fixed non-local matching and aggregation operators while optimizing the filtering step end-to-end, which results in higher PSNR and SSIM across various noise levels.
The method is both parameter-efficient and robust, effectively preserving structural details and suppressing noise artifacts in low-dose CT reconstructions.

Deep Unfolded BM3D (DU-BM3D) is a hybrid denoising framework that systematically unrolls the established Block-Matching and 3D Filtering (BM3D) algorithm into a trainable neural architecture by embedding a U-Net within BM3D’s workflow. By preserving BM3D’s non-local, self-similarity-based matching and aggregation stages as fixed operators and replacing only the collaborative filtering step with a differentiable convolutional neural network, DU-BM3D combines the interpretability and inductive bias of classic non-local methods with the adaptability of modern deep learning. This approach enables end-to-end optimization, efficiently leveraging both local and non-local correlations for robust image denoising. DU-BM3D demonstrates significant improvements in denoising performance—particularly for low-dose CT reconstruction—compared to both classic BM3D and pure U-Net architectures, while maintaining parameter efficiency (Basim et al., 15 Nov 2025).

1. Foundations: BM3D Algorithm

BM3D is a three-stage, non-local patch-based denoiser. Its core steps are as follows:

Block-Matching and Grouping: For each reference patch $p_r$ , nearby patches $\{p_j\}$ are identified within a fixed search window based on a squared $\ell_2$ distance threshold $\tau$ :

$\|p_r - p_j\|_2^2 < \tau$

The top matches are stacked to form a 3D group tensor $G$ .

Collaborative Filtering: A separable 3D transform $T$ (e.g., DCT along each axis) is applied to $G$ , followed by coefficient shrinkage $\mathcal{S}_\lambda(\cdot)$ (hard or soft thresholding), and then inverse transformed:

$\hat{G} = T^{-1}\bigl(\mathcal{S}_\lambda\bigl(T(G)\bigr)\bigr)$

Aggregation: The denoised patches $\hat{G}$ are returned to their original locations and averaged with overlap-aware weights $w$ :

$\hat{y}[u] = \frac{\sum_{k: u\in\mathrm{patch}_k} w_k\,\hat{p}_k[u]}{\sum_{k: u\in\mathrm{patch}_k} w_k}$

BM3D’s non-learned parameters and hard-coded transforms can limit its adaptation, especially under shifting or severe noise regimes.

2. DU-BM3D Architecture and Unrolling Methodology

DU-BM3D systematically “unfolds” the BM3D workflow, preserving the block-matching (grouping) and aggregation operators but replacing the collaborative filter with a U-Net. The model operates as follows:

Grouping (Fixed Operator $\mathcal{M}$ ): Patches from the noisy image $x_l$ are grouped non-locally to form stacks:

$G_l = \mathcal{M}(x_l)$

Learnable Filtering (Trainable U-Net $D_\theta$ ): The patch stack $G_l$ is denoised using a compact U-Net parameterized by $\theta$ :

$\hat{G}_n = D_\theta(G_l)$

Aggregation (Fixed Operator $\mathcal{A}$ ): Denoised patches are returned to their spatial coordinates and combined:

$\hat{x}_n = \mathcal{A}(\hat{G}_n)$

The overall model is thus: $\hat{x}_n = f_\theta(x_l) = \mathcal{A}(D_\theta(\mathcal{M}(x_l)))$

Importantly, only the parameters of $D_\theta$ are updated during training; matching and aggregation remain fixed and non-learnable.

3. Embedded U-Net Design

The filter $D_\theta$ is described as a compact U-Net with an encoder-decoder structure and skip connections. While the paper does not specify granular architectural details (such as convolutional kernel sizes or channel widths), two critical characteristics are:

The input to $D_\theta$ is a 3D group of non-locally matched patches, ensuring the network exploits both local correlations (via convolutions) and non-local information (via grouping).
The same backbone is used for the DU-BM3D module and the standalone U-Net baseline to ensure fair comparison.

A plausible implication is that the non-local prior embedded in the grouping operator provides complementary contextual information not available to conventional convolutional architectures trained on raw images.

4. Training Procedure and Optimization

DU-BM3D is trained end-to-end using a mean squared error (MSE) objective: $\mathcal{L}(\theta) = \frac{1}{N} \sum_{i=1}^N \|f_\theta(x_{l,i}) - x_{n,i}\|_2^2$ Where $x_{l, i}$ are noisy inputs and $x_{n, i}$ are the ground truth images.

Key training attributes:

Optimizer: Adam, with backpropagation
Epochs: 20
Batch size: 16
Training device: NVIDIA A100 GPU
Only $D_\theta$ is learned; $\mathcal{M}$ and $\mathcal{A}$ are fixed
No explicit mention of regularization or learning rate schedule

The optimization problem is a single-stage, fully unfolded setup: $\min_\theta \frac{1}{N} \sum_{i=1}^N \|\mathcal{A}(D_\theta(\mathcal{M}(x_{l,i}))) - x_{n,i}\|_2^2$ No iterative refinement or multi-pass unrolling is performed.

5. Experimental Evaluation: Low-Dose CT (LDCT) Denoising

The principal application explored is LDCT reconstruction, using CT images from the DeepLesion dataset with synthetic low-dose projections per LoDoPaB-CT protocols. Key experimental parameters:

Noise Regimes: Photon counts (10k, 50k, 100k, 500k)
Training Regime: Model trained only on 100k-photon noise level; evaluated across all tested noise levels (generalization)
Data Split: 69% train, 14% validation, 17% test
Baselines:
- Classic BM3D (untuned)
- Standalone U-Net (same architecture as $D_\theta$ )
Metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM)

Quantitative performance is summarized below:

Photon Count	Noisy	BM3D	U-Net	DU-BM3D
10k	15.63 / 0.6304	17.79 / 0.6638	20.47 / 0.6887	24.15 / 0.7417
50k	20.67 / 0.7060	28.05 / 0.8214	25.49 / 0.7751	30.34 / 0.8657
100k	23.04 / 0.7423	29.87 / 0.8600	27.71 / 0.8184	31.77 / 0.8848
500k	28.07 / 0.8295	30.34 / 0.8653	30.84 / 0.8891	31.99 / 0.8861

DU-BM3D consistently achieves the highest PSNR and SSIM at all noise levels, with the most pronounced margin at the highest noise (10k photons). Qualitative assessments indicate effective noise suppression by DU-BM3D with preservation of lesion boundaries and subtle anatomical details, while BM3D produces stacking artifacts and U-Net induces oversmoothing in high-noise conditions.

6. Model Complexity, Ablation Insights, and Limitations

Parameter Efficiency: DU-BM3D is substantially more parameter-efficient than the standalone U-Net, due to the dimensionality reduction inherent in patch-grouping and the localized action of $D_\theta$ .
Inference Time: The model is faster than traditional BM3D, but not as fast as the pure U-Net baseline.
Ablations: No ablation studies on unrolling depth, U-Net width, or grouping/matching parameters are reported. The paper notes that only the collaborative filtering stage is made learnable; matching and aggregation remain algorithmic and static. There are no experiments exploring alternative grouping windows or aggregation weights.
Generalization: DU-BM3D is trained on a single noise level but generalizes well across unseen noise levels, suggesting the interaction between non-local priors and learnable filtering is robust to noise distribution mismatch.

Limitations disclosed by the authors include the restriction of learnable components to the collaborative filter, and the potential for further gains by adapting other BM3D stages or introducing multi-stage unrolling or alternate denoising backbones.

7. Context and Significance

DU-BM3D bridges the interpretability and prior-driven structure of classical non-local denoising (BM3D) with the expressivity and adaptability of deep learning (U-Net). By unrolling only the collaborative filtering step, it maintains strict algorithmic control over matching and aggregation, potentially reducing overfitting and preserving key structural information in images. Its ability to outperform both BM3D and U-Net across a range of challenging noise regimes—using fewer parameters and a fixed training set—underscores the value of hybrid, unrolled architectures in signal restoration tasks. Future directions noted by the authors include making further BM3D operators learnable and extending the model to multi-stage or alternative convolutional denoiser designs (Basim et al., 15 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Deep Unfolded BM3D: Unrolling Non-local Collaborative Filtering into a Trainable Neural Network (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Deep Unfolded BM3D (DU-BM3D).