Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

FLCPooling (FLC): Frequency-Domain Anti-Aliasing

Updated 27 July 2025
  • FLCPooling is a family of anti-aliasing pooling operations that applies FFT-based low-pass filtering to ensure alias-free spatial downsampling in CNNs.
  • By strictly removing frequencies above the Nyquist rate, FLCPooling improves model robustness and produces artifact-free explanation maps in safety-critical applications.
  • Practical implementations replace standard strided downsampling with a four-step process—FFT transform, filtering, IFFT, and actual subsampling—to mitigate aliasing and reduce spectral artifacts.

FLCPooling (FLC) is a family of anti-aliasing pooling operations designed to perform spatial downsampling in neural networks—particularly convolutional neural networks (CNNs)—in a manner that is provably free of aliasing, leveraging concepts from frequency-domain signal processing. The principal motivation for FLCPooling is to preserve the fidelity of feature maps during downsampling, which is critical for both robustness against common corruptions and the interpretability of network explanation maps, especially in safety-critical applications such as medical imaging.

1. Definition and Theoretical Principle

FLCPooling implements pooling by first transforming feature maps into the frequency domain using the Fast Fourier Transform (FFT), then performing a low-pass filtering operation (usually a box or Gaussian filter) that strictly removes frequencies above the Nyquist rate for the target resolution, and finally applying the inverse FFT (IFFT) before spatial downsampling. This guarantees that high-frequency (aliased) components are eliminated, satisfying the sampling theorem and preventing frequency folding artifacts.

In symbolic terms, for a 2D feature map ff, FLCPooling can be abstractly represented as: FLC(f)=Downsample(F1[HF(f)])\text{FLC}(f) = \text{Downsample}\left( \mathcal{F}^{-1}\left[ H \cdot \mathcal{F}(f) \right] \right) where F\mathcal{F} and F1\mathcal{F}^{-1} denote the FFT and IFFT, and HH is a frequency-domain low-pass filter (commonly a rectangular cutoff; H(u,v)=1H(u,v)=1 for u,v<fcutoff|u|,|v|<f_\text{cutoff}, $0$ otherwise or a Gaussian: H(u,v)=exp(u2+v22σ2)H(u,v)=\exp\left(-\frac{u^2+v^2}{2\sigma^2}\right)).

The theoretical result (Grabinski et al., 2023) is that applying such a low-pass filter in the frequency domain before downsampling guarantees the removal of all frequency components that would alias into the lower-resolution grid. This stands in contrast to traditional spatial pooling (e.g., max or average pooling, or strided convolutions), which may leave aliased energy in the representation.

2. Implementation Details

A standard FLCPooling layer replaces any occurrence of downsampling via stride or pooling operation. Instead, the convolution (or other nonlinearity) is performed at full spatial resolution (stride $1$), followed by a separate FLCPooling operation. The process entails:

  1. FFT Transform: Convert the spatial feature map to the frequency domain using 2D FFT.
  2. Frequency-Domain Filtering: Multiply by a low-pass filter HH that zeros out frequencies above the Nyquist limit corresponding to the new resolution.
  3. IFFT Back to Spatial: Transform back to the spatial domain using the IFFT.
  4. Downsampling: Subsample the spatial map at the desired stride.

In neural network frameworks, this process can be implemented as a custom pooling module. A prototypical PyTorch-like pseudocode (ignoring batch and channel dimensions) is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import torch
import torch.fft

def flc_pooling(x, output_size, filter='box'):
    # x: (H, W)
    X = torch.fft.fft2(x)
    H, W = x.shape
    # Build ideal filter mask
    cutoff_x = output_size[0] // 2
    cutoff_y = output_size[1] // 2
    mask = torch.zeros_like(X, dtype=bool)
    mask[:cutoff_x, :cutoff_y] = True
    if filter == 'gaussian':
        # Optionally apply a soft Gaussian mask instead
        pass
    X_filtered = X * mask
    x_ifft = torch.fft.ifft2(X_filtered).real
    # Downsample
    stride = H // output_size[0], W // output_size[1]
    return x_ifft[::stride[0], ::stride[1]]

In practice, kernels are designed to match the target output resolution, and care is taken to avoid numerical artifacts from FFT implementation (such as zero-padding or multi-stage FFT shift misalignments), as detailed in (Grabinski et al., 2023).

3. Applications in Robustness and Interpretability

FLCPooling has been adopted to mitigate the sensitivity of neural networks to aliasing artifacts, which otherwise propagate through the layers and are amplified in the network's representations. Empirical work demonstrates its advantages in several axes:

  • Robustness to corruptions: FLCPooling preserves information across frequency bands and increases resistance to perturbations such as Gaussian noise, blur, or adversarial examples. Networks using FLCPooling yield higher accuracy on corrupted and adversarially perturbed datasets such as CIFAR-10 and ImageNet (Grabinski et al., 2023).
  • Faithful Explainer Maps: In architectures that provide inherently interpretable explanations (e.g., B-cos networks), standard strided downsampling may cause grid-structured distortions in attribution maps. Introducing FLCPooling produces artifact-free, smooth explanation maps that are crucial for clinical interpretability in medical imaging, as shown in multi-label chest X-ray diagnosis (Kleinmann et al., 22 Jul 2025).

The following table summarizes the effect of FLCPooling versus traditional methods:

Pooling Method Aliasing Removal Spectral Artifacts Robustness Improvement
Strided/Avg No Often present Poor
BlurPool Partial Low Good
FLCPooling Yes May have ringing Strong
ASAP Yes Eliminated Strongest

In the FLCPooling approach, while aliasing is eliminated, sharp frequency cutoffs (rectangular filters) may introduce ringing artifacts near spatial edges. These can be addressed via improved windowing techniques (see ASAP below).

4. Limitations and Extensions

While FLCPooling guarantees alias-free downsampling, empirical analyses uncover secondary spectral artifacts (spectral leakage, ringing) due to the use of ideal rectangular filters. The spatial manifestation is localized oscillations around sharp features, which can be visually undesirable or even misleading for sensitive applications (Grabinski et al., 2023).

To resolve this, ASAP (“Aliasing and Spectral Artifact-free Pooling”) replaces the hard cutoff with a smooth Hamming window in frequency, thereby eradicating both aliasing and spectral leakage. Additionally, the alternation of FFT row/column order avoids cumulative misalignments. ASAP outperforms FLCPooling in both power spectrum consistency and task robustness, setting a new standard for artifact-free downsampling (Grabinski et al., 2023).

5. Integration into Network Architectures

In networks where interpretability and high-fidelity explanations are central (e.g., B-cos networks), FLCPooling is integrated by separating the convolution and pooling steps. The process is:

  1. Apply convolution at stride 1.
  2. Optionally apply an activation (e.g., MaxOut).
  3. Insert FLCPooling for anti-aliased downsampling.
  4. Continue the network pipeline.

For multi-label classification tasks, such as chest X-ray abnormality detection, FLCPooling is combined with architectures that produce per-label explanation maps. Each output neuron is associated with an explanation map updated through the network; the FLCPooling layer ensures smooth, artifact-free contribution maps without loss of discriminative performance (Kleinmann et al., 22 Jul 2025).

FLCPooling is implemented without degrading the main network’s accuracy, while enhancing the energy-based pointing game (EPG) metrics for localization interpretability.

6. Empirical Evaluation and Comparative Performance

Experimental data across multiple studies indicate:

  • FLCPooling achieves zero aliasing in the downsampling operation (as measured by explicit aliasing scores) (Grabinski et al., 2023).
  • On challenging datasets such as CIFAR-10 and ImageNet, networks using FLCPooling demonstrate increased resilient accuracy (on the order of +3% improvement for common corruptions, and similar gains under adversarial attacks) compared to traditional downsampling.
  • In explanation-focused architectures, B-cos_FLC (B-cos networks with FLCPooling) yield explanation maps that are smoother and more clinically relevant compared to both baseline (strided) and BlurPool-based variants, improving the EPG score by up to 5 points on multi-label datasets (Kleinmann et al., 22 Jul 2025).

Furthermore, when compared to BlurPool (spatial low-pass filtering), FLCPooling provides a mathematically principled guarantee of frequency domain separability, making it preferable when precise control over anti-aliasing is required.

7. Practical Recommendations and Future Prospects

For modern high-stakes applications—where robustness, interpretability, and artifact-free network outputs are critical—integrating FLCPooling or its smoothly windowed variants (ASAP) into the neural network design is recommended, especially for layers responsible for spatial downsampling. The selection of the frequency-domain filter (hard cut or Hamming/other windows) should be guided by the desired trade-off between aliasing suppression and spatial artifact minimization.

Continued research explores learnable frequency cutoffs, channel-wise frequency processing, and the integration of cohomological invariants to capture more general deformation-equivariant pooling, connecting the mathematical framework of finite local complexity to practical machine learning systems (Julien et al., 2015). This suggests a rich theory-practice interface for future development of adaptive, mathematically grounded pooling layers.