BlurPool Strategy in CNNs
- BlurPool is a pooling strategy in CNNs that applies a normalized low-pass blur kernel before downsampling to reduce aliasing artifacts like checkerboards.
- It employs small, symmetric kernels (e.g., 3×3 or 5×5 binomial) to suppress high-frequency components, ensuring stable and shift-invariant feature representations.
- Empirical results demonstrate PSNR gains of up to 0.6 dB in self-supervised denoising tasks, highlighting improved structural consistency and artifact reduction.
BlurPool is a pooling strategy for convolutional neural networks (CNNs) that incorporates a low-pass filtering step—typically a small blur convolution—immediately before downsampling. This approach explicitly addresses the aliasing artifacts, most notably checkerboards, that emerge when standard pooling or subsampling operations retain high-frequency components in feature maps. It has found particular utility in the context of self-supervised image denoising architectures such as N2V2, where reducing aliasing and enforcing shift-invariance are critical for high-fidelity results (Höck et al., 2022).
1. Mathematical Formulation
Let denote a single channel of the input feature map. BlurPool first convolves with a normalized blur kernel of size , satisfying . The stride- downsampling with pre-filtering is then:
where is the stride (typically in N2V2), , 0.
For the "max-blur-pool" variant introduced by Zhang (2019) and used in N2V2, the following two-step procedure is performed channel-wise:
- Blur: 1
- Max-pool: 2
This process first suppresses high-frequency components before decimation, and then incorporates the aggressive selection behavior of max-pooling.
2. Blur Kernel Design and Normalization
The blur kernel 3 is chosen to be small, symmetric, and often separable. Two common options are:
- 4 binomial (Pascal's 1–2–1)/4:
5
- 6 binomial (Pascal's 1–4–6–4–1)/16, constructed from:
7
Both kernels are normalized so that the sum of their elements equals one, ensuring that the overall scaling of feature maps is preserved. N2V2 typically uses the 8 kernel for computational efficiency and sufficient suppression of aliasing (Höck et al., 2022).
3. Rationale for BlurPool in Self-Supervised Denoising
In standard 9 max-pooling with stride 2, only the strongest activation in each window is preserved, which does not uniformly attenuate high frequencies. Isolated strong activations may propagate through the network, leading to aliasing effects when subsequently upsampling (e.g., via transpose convolution or bilinear upsampling), manifesting as checkerboard patterns.
Zhang (2019) demonstrated, both via Fourier analysis and empirical results, that inserting a normalized low-pass filter before downsampling restores shift-invariance. Specifically, the network's outputs become stable under small translations of the input. In self-supervised denoising methods like Noise2Void, unfiltered high-amplitude noise and outlier responses are likely to alias at each encoder downsampling step, which the decoder may reconstruct as visible artifacts.
Pre-filtering with BlurPool ensures that outliers are attenuated before downsampling, preventing their recurrence as artifacts and enhancing structural consistency in the final reconstruction (Höck et al., 2022).
4. Integration of BlurPool in U-Net Architectures
In the N2V2 architecture, BlurPool is systematically inserted at every encoder downsampling location, replacing the standard 0 max-pool layers. The key architectural changes relative to the original N2V variant include:
- Removal of the long residual skip from input to output, reverting to a plain (non-residual) U-Net.
- Elimination of the topmost encoder-decoder skip connection, biasing the network toward low-frequency retention at the bottleneck.
- Substitution of every encoder max-pool step with a 1 max-blur-pool: first, a 2 binomial kernel is applied channel-wise, then a 3 max-pool with stride 2 is performed.
The encoder sequence becomes:
Conv(4) 5 ReLU 6 Conv(7) 8 ReLU 9 Max-Blur-Pool(0), repeated to the required depth. The decoder mirrors the convolutional blocks and uses bilinear upsampling followed by convolution and ReLU.
The only encoder-decoder skip connections retained are from the second-finest levels downward; the coarsest encoder output is not concatenated into the decoder (Höck et al., 2022).
5. Quantitative and Qualitative Effects
The introduction of BlurPool yields measurable improvements in both quantitative image quality metrics and qualitative artifact reduction. Across datasets spanning natural images (BSD68, 1) and various fluorescence microscopy benchmarks (e.g., Mouse cell nuclei, Flywing membranes, Convallaria), consistent PSNR gains of 2–3 dB are observed over the baseline Noise2Void architecture.
Representative results:
| Dataset and Condition | Vanilla N2V PSNR | N2V2 (BlurPool, median) PSNR | Gain (dB) | Checkerboards Present |
|---|---|---|---|---|
| BSD68 (4) | 27.70 | 28.32 | +0.62 | No (N2V2), Yes (N2V) |
| Mouse G20 | 34.12 | 34.74 | +0.62 | No (N2V2), Yes (N2V) |
| Flywing G70 | 25.20 | 25.57 | +0.37 | No (N2V2), Yes (N2V) |
Checkerboard patterns that are prominent in vanilla N2V outputs are essentially eliminated by the BlurPool-enhanced N2V2. Structural features such as corners, edges, and cell membranes display improved stability under input shifts, and bright noise outliers are more effectively suppressed before downsampling (Höck et al., 2022).
6. Broader Significance and Limitations
BlurPool serves two central functions in the described context: (a) it enforces a compact, shift-invariant low-pass prior at every downsampling stage, combating aliasing artifacts; (b) it, when combined with updated blind-spot sampling strategies, achieves across-the-board improvements in PSNR and SSIM of up to approximately half a decibel, without increasing training data requirements or imposing significant computational overhead. The method offers an effective architectural intervention for reducing aliasing-induced degradations in CNN-based image processing pipelines (Höck et al., 2022).