PatchCleanser: Certifiably Robust Defense against Adversarial Patches for Any Image Classifier (2108.09135v2)

Published 20 Aug 2021 in cs.CV and cs.CR

Abstract: The adversarial patch attack against image classification models aims to inject adversarially crafted pixels within a restricted image region (i.e., a patch) for inducing model misclassification. This attack can be realized in the physical world by printing and attaching the patch to the victim object; thus, it imposes a real-world threat to computer vision systems. To counter this threat, we design PatchCleanser as a certifiably robust defense against adversarial patches. In PatchCleanser, we perform two rounds of pixel masking on the input image to neutralize the effect of the adversarial patch. This image-space operation makes PatchCleanser compatible with any state-of-the-art image classifier for achieving high accuracy. Furthermore, we can prove that PatchCleanser will always predict the correct class labels on certain images against any adaptive white-box attacker within our threat model, achieving certified robustness. We extensively evaluate PatchCleanser on the ImageNet, ImageNette, CIFAR-10, CIFAR-100, SVHN, and Flowers-102 datasets and demonstrate that our defense achieves similar clean accuracy as state-of-the-art classification models and also significantly improves certified robustness from prior works. Remarkably, PatchCleanser achieves 83.9% top-1 clean accuracy and 62.1% top-1 certified robust accuracy against a 2%-pixel square patch anywhere on the image for the 1000-class ImageNet dataset.

Authors (3)

Chong Xiang (19 papers)
Saeed Mahloujifar (43 papers)
Prateek Mittal (129 papers)

Citations (67)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

PatchCleanser: Certifiably Robust Defense against Adversarial Patches for Any Image Classifier (2108.09135v2)

Summary

Related Papers