(De)Randomized Smoothing for Certifiable Defense against Patch Attacks

Published 25 Feb 2020 in cs.LG, cs.CV, and stat.ML | (2002.10733v3)

Abstract: Patch adversarial attacks on images, in which the attacker can distort pixels within a region of bounded size, are an important threat model since they provide a quantitative model for physical adversarial attacks. In this paper, we introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size, no patch adversarial examples exist. Our method is related to the broad class of randomized smoothing robustness schemes which provide high-confidence probabilistic robustness certificates. By exploiting the fact that patch attacks are more constrained than general sparse attacks, we derive meaningfully large robustness certificates against them. Additionally, in contrast to smoothing-based defenses against L_p and sparse attacks, our defense method against patch attacks is de-randomized, yielding improved, deterministic certificates. Compared to the existing patch certification method proposed by Chiang et al. (2020), which relies on interval bound propagation, our method can be trained significantly faster, achieves high clean and certified robust accuracy on CIFAR-10, and provides certificates at ImageNet scale. For example, for a 5-by-5 patch attack on CIFAR-10, our method achieves up to around 57.6% certified accuracy (with a classifier with around 83.8% clean accuracy), compared to at most 30.3% certified accuracy for the existing method (with a classifier with around 47.8% clean accuracy). Our results effectively establish a new state-of-the-art of certifiable defense against patch attacks on CIFAR-10 and ImageNet. Code is available at https://github.com/alevine0/patchSmoothing.

Abstract PDF Upgrade to Chat

Citations (141)

View on Semantic Scholar

Summary

The paper introduces a certifiable defense that replaces randomness with deterministic, structured smoothing to counter patch adversarial attacks.
It employs structured ablation methods, such as block and band smoothing, which significantly improve certified accuracy (up to 57.6% on CIFAR-10) compared to earlier techniques.
The approach offers practical insights for enhancing AI security in real-world applications, including autonomous systems under localized adversarial threats.

Certifiable Defense Against Patch Adversarial Attacks

Adversarial attacks on machine learning models, especially those in the form of image manipulation, have posed significant challenges to the reliability and security of such systems in real-world applications. The paper by Alexander Levine and Soheil Feizi addresses a specific threat model known as patch adversarial attacks, where an attacker can alter pixels within a localized patch without changing the entire image. This method provides a quantitative framework to model physical adversarial attacks effectively.

Main Contributions

Levine and Feizi propose a certifiable defense strategy exploiting the structured nature of patch attacks. Their strategy reflects an advancement over the classical randomized smoothing methods that previously offered probabilistic robustness certificates. By departing from randomness, the authors introduce deterministic robustness certificates that are particularly suitable for patch attacks.

One of the prominent contributions of the paper is the development of structured ablation methods, notably block and band smoothing, which are tailored specifically to the geometry of patch attacks. This is a departure from general L0 attacks, where pixels can be independently selected. By constraining their selection to contiguous patches of pixels, these methods significantly reduce the probability that an adversarial patch is sampled, leading to higher certified accuracy.

Results and Implications

The paper presents strong numerical results showcasing that their method achieves superior certified accuracy compared to previously established methods. For example, the structured ablation method results in up to 57.6% certified accuracy against 5x5 patch attacks on CIFAR-10, which is substantially higher than the 30.3% provided by earlier methodologies. Notably, their defense is scalable to complex datasets like ImageNet with 42x42 patches, achieving comparable certified accuracy to practical defense models, although with reduced clean accuracy.

The implications of these findings extend to the broader domain of AI security, suggesting the possibility of enhancing model defenses against realistic physical threats such as tampered stop signs in autonomous driving. Thus, while this paper focuses on patch attacks, the structured approach may offer insights into enhancing defenses against other spatially localized attacks in neural networks.

Future Directions

The establishment of deterministic certificates and structured ablation opens several avenues for future research in adversarial the defense domain. One potential direction could explore optimizing base classifiers to further improve clean and certified accuracy, especially on datasets with higher complexity than CIFAR-10. Additionally, integrating these techniques into real-time applications, such as surveillance systems or autonomous vehicles, could substantially raise the robustness against adversarial manipulation.

In summary, Levine and Feizi's paper provides a significant enhancement to the mechanisms of certifiable adversarial defenses, specifically tailored to patch attacks. By innovating upon traditional randomized methods with structured and deterministic approaches, their contributions lay foundational groundwork for fortifying AI systems against spatially localized adversarial threats.

Markdown