Papers
Topics
Authors
Recent
2000 character limit reached

SWM-AED: Sliding Mask Adversarial Detection

Updated 14 November 2025
  • The paper introduces a sliding window mask mechanism that computes Sliding Mask Confidence Entropy (SMCE) to capture prediction fluctuations between clean and adversarial images.
  • It employs a model-agnostic, retraining-free approach where localized occlusions reveal significant confidence variability in adversarial attacks.
  • Empirical evaluations on CIFAR-10 show detection accuracies from 62.5% to 96%, outperforming several adversarial training methods and demonstrating robust generalizability.

Sliding Window Mask-based Adversarial Example Detection (SWM-AED) is a detection framework for adversarial examples that operates by quantifying the sensitivity of deep neural network (DNN) predictions to local occlusions via a sliding window mask. Motivated by empirical findings that adversarial images exhibit significantly higher classifier confidence fluctuation under localized masking compared to clean images, SWM-AED exploits this property to distinguish adversarial attacks across a wide array of image-based perturbation methods. The method introduces the concept of Sliding Mask Confidence Entropy (SMCE) as a detection statistic, offering a model-agnostic and retraining-free approach robust to many canonical attacks.

1. Sliding Window Mask Mechanism

SWM-AED applies a dark occlusion patch ("mask") of fixed size m×mm \times m to an input image xRH×W×Cx \in \mathbb{R}^{H \times W \times C}. The mask is systematically "slid" across all spatial locations (i,j)(i, j) where it fully fits within the image, using a stride ss (typically s=1s = 1). At each position (i,j)(i, j), an occluded image xMi,jx \odot M_{i, j} is generated by setting the masked region’s pixels to zero (or a designated fixed value).

Each occluded image is passed through a pretrained classifier ff, producing a softmax confidence vector pij=f(xMi,j)Rmp_{ij} = f(x \odot M_{i, j}) \in \mathbb{R}^m over mm classes. The characteristic property underlying the method is that clean images tend to be robust against such localized occlusion—the classifier’s top-1 confidence remains stable—while adversarial examples, which are typically positioned near decision boundaries, react with pronounced fluctuations or confidence collapse for certain mask placements.

2. Sliding Mask Confidence Entropy (SMCE)

The SMCE provides a scalar measure of the model’s prediction volatility under sliding mask perturbations. Formally, for nn sliding positions, SMCE is defined as:

HSMCE(x)=1ni=1n[j=1mpijlog2pij]H_{\rm SMCE}(x) = \frac{1}{n} \sum_{i=1}^n \left[ - \sum_{j=1}^m p_{ij} \log_2 p_{ij} \right]

For each mask position:

  • Compute the post-mask confidence vector pi=(pi1,,pim)p_i = (p_{i1}, \ldots, p_{im})
  • Compute its entropy hi=j=1mpijlog2pijh_i = - \sum_{j=1}^m p_{ij} \log_2 p_{ij}
  • Average all hih_i to get an overall entropy score HSMCE(x)H_{\rm SMCE}(x)

Key properties:

  • HSMCE(x)0H_{\rm SMCE}(x) \geq 0
  • HSMCE(x)log2mH_{\rm SMCE}(x) \leq \log_2 m (achieved under uniform confidence)

SMCE reflects the degree of confidence destabilization under occlusion. Clean samples yield low entropy, while adversarial samples—especially those near classification boundaries—result in much higher SMCE due to large softmax perturbations.

3. SWM-AED Algorithm

The SWM-AED detection decision is based on whether an image’s SMCE exceeds a threshold τ\tau. The process is summarized below:

1
2
3
4
5
6
7
8
9
10
11
Input: image x, classifier f, mask size m, stride s, threshold τ
Output: is_adversarial  {True, False}

1. H  0 ; positions  all (i,j) where mask fits in x with stride s
2. For each (i,j) in positions:
     M  zero mask of size m×m placed at (i,j)
     p  f(x  M)             # softmax confidence over classes
     h   sum_k p[k] * log2(p[k])
     H  H + h
3. H  H / |positions|       # average entropy
4. Return (H > τ)

Typical parameter choices for CIFAR-10 include m=7m = 7, s=1s = 1, and τ0.1\tau \approx 0.1. If SMCE exceeds τ\tau, the sample is flagged as adversarial. No model retraining is involved; the base classifier is used as-is.

4. Theoretical Underpinnings

The rationale for SWM-AED roots itself in the geometric properties of adversarial examples in deep feature space:

  • Adversarial images x+rx + r are constructed to reside close to a decision boundary. Minor perturbations, such as those introduced by small occlusions, are disproportionately likely to alter the classifier’s output or diffuse the softmax distribution, resulting in increased entropy.
  • Clean images are typically embedded within “confidence basins” where local occlusions scarcely impact the classifier’s decision. This computes as lower SMCE.
  • SMCE thus operationalizes the distinct local smoothness between clean and adversarial regions, serving as a metric for localized vulnerability.

5. Experimental Validation

Evaluations were conducted on CIFAR-10, employing 1,800 clean and 1,800 adversarial images (200 per attack). Adversarial attacks included FGSM, PGD, BIM, JSMA, DeepFool, FFGSM, APGD, OnePixel, and PIFGSM-PP. Tested classifiers included ResNet-18 (≈96% accuracy), ResNet-50 (≈79%), and VGG-11 (≈81%). Masks of size 3×33 \times 3, 7×77 \times 7, and 9×99 \times 9 were employed with stride 1. Thresholds were swept in [0.05, 0.2], with τ0.1\tau \approx 0.1 identified as robust.

Detection accuracy (ResNet-18, 7×77 \times 7 mask, τ=0.1\tau=0.1): | Attack | Accuracy (%) | |--------------|--------------| | JSMA | 96.0 | | DeepFool | 90.0 | | FGSM | 85.5 | | PGD | 76.0 | | BIM | 62.5 | | FFGSM | 82.0 | | APGD | 75.0 | | OnePixel | 73.5 | | Pixle | 89.0 | | PIFGSM-PP | 79.0 |

Average detection was ~80%. The method demonstrated robustness across attacks, with worst-case 62.5% (BIM) and best-case 96.5% (JSMA) detection rates. Smaller masks (3×33 \times 3) yielded slightly higher average performance, but 7×77 \times 7 was optimal for CIFAR-10 images. Higher-capacity classifiers had greater SMCE separability between clean and adversarial inputs, enhancing detection.

Compared to adversarial training methods (PGD-AT, Fast-AT, Free-AT), which defend ~47% of PGD examples, SWM-AED achieved ~76% detection on PGD.

6. Comparison with Other Defenses and Deployment Practices

Unlike adversarial training and other defenses, SWM-AED:

  • Does not require retraining or network modification, precluding risks of catastrophic overfitting and obviating additional training-time computational load.
  • Generalizes across attack types due to reliance on the intrinsic property of occlusion sensitivity, rather than attack-specific signatures or gradient masking.
  • Operates in a model-agnostic fashion, functioning with any pretrained classifier ("plug-and-play") and requiring no extra detection head or reconstruction network. Detection performance benefits naturally from improvements in base model accuracy.

Recommended deployment guidelines:

  1. Use a deep, high-accuracy model (e.g., ResNet-18 or deeper) for SMCE computation.
  2. Select a mask size approximately one quarter of the image width (for CIFAR-10, 7×77 \times 7).
  3. Employ stride 1 for comprehensive spatial sampling.
  4. Calibrate threshold τ\tau on a mixed hold-out set of clean and adversarial data, beginning at ~0.1.
  5. Periodically monitor SMCE value distributions to detect overlap drift; adjust model capacity or mask size as necessary.

7. Limitations, Implications, and Outlook

SWM-AED achieves strong empirical detection (62%–96.5%) across diverse attacks without model retraining or architectural changes. Its effectiveness arises from quantifying susceptibility to local occlusion—a property that distinguishes clean from adversarial samples for modern DNN classifiers. Potential limitations include the possibility of diminished separability if future attacks explicitly optimize for SMCE invariance or if the base classifier demonstrates low accuracy, thereby compressing the SMCE gap between clean and adversarial distributions.

A plausible implication is that continual increases in model capacity and accuracy will enhance SMCE separability, consolidating the method’s utility as a lightweight, scalable adversarial screening layer for deployed image classifiers.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Sliding Window Mask-based Adversarial Example Detection (SWM-AED).