PatchCleanser: Robust Defense Against Patch Attacks

Updated 21 November 2025

PatchCleanser is a certifiably robust, model-agnostic defense that uses a novel two-round spatial masking protocol to guard against adversarial patch attacks.
It systematically covers every potential patch placement by applying a grid of binary masks, ensuring instance-level robustness across diverse image classifiers.
Demonstrated on large-scale datasets like ImageNet, PatchCleanser achieves near state-of-the-art clean accuracy and certified robust performance despite O(n²) inference overhead.

PatchCleanser is a certifiably robust, model-agnostic defense against adversarial patch attacks in image classifiers. The main innovation is a two-round spatial masking protocol that provides instance-level certificates of robustness against arbitrary, localized (typically square) adversarial patches, while being applicable to any pretrained classifier with minimal degradation in clean accuracy. PatchCleanser represents the first architecture-agnostic patch certification protocol that achieves state-of-the-art accuracy and rigorous guarantees on large-scale datasets such as ImageNet (Xiang et al., 2021).

1. Threat Model and Certification Objective

PatchCleanser addresses the threat of adversarial patch attacks, in which an adversary replaces all pixel values within an unknown, contiguous square region (the "patch") of a given image $x \in [0,1]^{W \times H \times C}$ by arbitrary values $p \in [0,1]^{W \times H \times C}$ : $x' = r \odot x + (1-r) \odot p$ where $r \in \{0,1\}^{W \times H}$ is a binary mask selecting the patch region. The defender does not know the patch location, size (within a budget), or content.

The goal is to construct a classifier $D$ such that for any patch-constrained adversarial input $x'$ , $D(x') = D(x) = y$ (true class) and to provide a valid certificate of this property for each input.

2. The Two-Round Masking Protocol

PatchCleanser operates by evaluating the candidate (potentially attacked) input $x'$ under numerous tightly localized binary masks. The method is organized as follows:

2.1. Mask Set Construction

A set $\mathcal{M}$ of masks $m \in \{0, 1\}^{W \times H}$ is chosen such that for every possible patch placement, there exists at least one mask $m$ that fully covers (zeros out) the adversarial region. In practice, a uniform grid of $k \times k$ rectangular masks, each of size matching the patch budget, is used (“covering property”). For square patch of side $w$ , $k = \lceil W / w \rceil$ .

2.2. Double-Masking Algorithm

The core inference pipeline for image $x$ and base classifier $F$ is:

First Masking Pass (Round 1): Compute $F(x \odot m)$ for all $m \in \mathcal{M}$ . If all predictions agree on label $y_\text{maj}$ , output $y_\text{maj}$ .
Second Masking Pass (Round 2): For each instance $(m_\text{dis}, y_\text{dis})$ where $F(x \odot m_\text{dis}) \neq y_\text{maj}$ , apply all masks again: $F((x \odot m_\text{dis}) \odot m)$ for $m \in \mathcal{M}$ . If, for some $m_\text{dis}$ , all second-round predictions unanimously agree on $y_\text{dis}$ , output $y_\text{dis}$ .
Majority Vote: In all other cases, output the label with highest frequency among round-1 predictions.

This protocol ensures that, provided the base model $F$ has two-mask correctness (i.e., for any mask pair $(m_0, m_1)$ , $F(x \odot m_0 \odot m_1) = y$ ), no patch within the specified budget can cause misclassification (Xiang et al., 2021, Lyu et al., 13 Nov 2025).

3. Theoretical Foundations and Certification Guarantee

Let $\mathcal{M}$ be an $R$ -covering mask set for the set $R$ of all allowed patch placements. The certification property relies on the following:

Effective Coverage: In 2D, a mask $[x,x+M_x] \times [y,y+M_y]$ with side $(M_x, M_y)$ fully covers a patch of radius $(r_x, r_y)$ centered at $(C_x,C_y)$ iff

$x+r_x \leq C_x \leq x+M_x - r_x \quad \text{and} \quad y+r_y \leq C_y \leq y+M_y - r_y$
Certification Test: For two-mask correctness, check that for all $(m_0, m_1) \in \mathcal{M} \times \mathcal{M}$ , $F(x \odot m_0 \odot m_1) = y$ . This requires $O(k^2)$ evaluations.

The main theorem states that, under these conditions, DoubleMasking certifies robustness against any adversarial patch constrained to the patch budget (Xiang et al., 2021).

4. Empirical Properties, Complexity, and Limitations

4.1. Accuracy and Certified Robustness

On large-scale benchmarks (ImageNet, CIFAR-10/100, SVHN), PatchCleanser achieves:

Nearly state-of-the-art clean accuracy (e.g., 83.9% top-1 on ImageNet with ViT-B/16, compared to 84.8% for vanilla ViT)
Certified robust accuracy of 62.1% for 2%-area patches (ImageNet), doubling prior certifiable methods (Xiang et al., 2021).

Ablations show robustness degrades gracefully as the true patch size increases beyond the assumed budget; clean accuracy is nearly unaffected by mask count and protocol details.

4.2. Computational Complexity

PatchCleanser incurs $O(n^2)$ computational cost at inference, with $n \simeq k^2$ (number of masks). For practical ImageNet settings ( $k=6$ , $n=36$ ), up to 72 forward passes per image are required in the worst case. Fast-exit variants reduce average-case cost.

The quadratic scaling in the number of masks makes real-time or high-resolution deployment infeasible. This challenge has directly motivated subsequent methods seeking $O(n)$ complexity such as CertMask (Lyu et al., 13 Nov 2025).

4.3. Limitations

Distributed and Subtle Patch Attacks: PatchCleanser is ineffective against distributed attacks (e.g., DorPatch) where the adversarial budget is dispersed into low-magnitude fragments below the detection threshold of any mask (Khalili et al., 22 May 2025). In such cases, robust accuracy drops to 0%, and certificates can be erroneously returned.
High Masking/Low Signal: Two rounds of masking heavily occlude the input, potentially reducing model discriminability.
Inference Overhead: $O(n^2)$ mask combinations result in high computational and latency requirements; not suitable for real-time use in embedded or large-scale systems.
Certification Stochasticity: In practical variants, mask placement may be randomized, resulting in probabilistic, not absolute, certificates (Lyu et al., 13 Nov 2025).

5. Methodological Extensions and Variants

5.1. Training-Time Improvements

The performance of PatchCleanser heavily depends on the base model's invariance to masked inputs. While the original protocol recommends random "Cutout" augmentation during fine-tuning, improved certified robust accuracy is obtained by training with worst-case or greedy two-mask candidates (the Greedy Cutout procedure) (Saha et al., 2023). Greedy Cutout identifies approximate maximal-loss mask pairs efficiently, boosting certified accuracy by 3–8 points across datasets at modest computational cost.

5.2. Extensions to Multiple Patches and Arbitrary Shapes

PatchCleanser’s mask set can be constructed to provide coverage guarantees for multiple patches or rectangular patches of arbitrary shape, with a corresponding increase in mask set size and computational demands (Xiang et al., 2021).

6. Comparative Analysis and Evolution

PatchCleanser introduced the first architecture-agnostic, high-clean-accuracy certified defense for patch attacks, contrasting with prior works such as IBP, Clipped BagNet, and PatchGuard, which rely on small receptive field architectures and achieve lower accuracy. The $O(n^2)$ scaling and masking inefficiency are directly addressed by subsequent works:

CertMask: Achieves equivalent or improved certified accuracy with a single round of masking ( $O(n)$ inference), via a theoretically optimal coverage tiling (Lyu et al., 13 Nov 2025).
SuperPure, DiffPAD: Move beyond masking to iterative GAN or diffusion purification, which overcome PatchCleanser's vulnerability to distributed attacks and deliver lower latency (Khalili et al., 22 May 2025, Fu et al., 31 Oct 2024).

7. Application in Domain-Specific and Non-Adversarial Patch Removal

Beyond adversarial patch certification, PatchCleanser's pipeline has been adapted, e.g., for filtering out Martian surface image patches corrupted by atmospheric dust in HiRISE satellite imagery (Kasodekar, 8 May 2024). There, a modular pipeline encompassing classification (ResNet-50), denoising autoencoders, and pix2pix GAN refinement is integrated in a "PatchCleanser" system for automated scientific image triage, demonstrating the generality of the iterative masking-and-vote framework.

Key references: (Xiang et al., 2021) (original algorithm and theory), (Saha et al., 2023) (Greedy Cutout training), (Lyu et al., 13 Nov 2025) (theoretically optimal masking comparison), (Khalili et al., 22 May 2025) (limitations and distributed attacks), (Fu et al., 31 Oct 2024) (diffusion model extensions), (Kasodekar, 8 May 2024) (domain-specific deployment).