Certified Defenses for Adversarial Patches (2003.06693v2)

Published 14 Mar 2020 in cs.CR, cs.LG, and stat.ML

Abstract: Adversarial patch attacks are among one of the most practical threat models against real-world computer vision systems. This paper studies certified and empirical defenses against patch attacks. We begin with a set of experiments showing that most existing defenses, which work by pre-processing input images to mitigate adversarial patches, are easily broken by simple white-box adversaries. Motivated by this finding, we propose the first certified defense against patch attacks, and propose faster methods for its training. Furthermore, we experiment with different patch shapes for testing, obtaining surprisingly good robustness transfer across shapes, and present preliminary results on certified defense against sparse attacks. Our complete implementation can be found on: https://github.com/Ping-C/certifiedpatchdefense.

Citations (164)

View on Semantic Scholar

Summary

The paper introduces a certified defense that extends interval bound propagation to verify robustness against adversarial patch attacks.
It critiques existing methods like LGS and DW, demonstrating their vulnerability when facing sophisticated white-box attacks using BPDA.
It proposes scalable training strategies, such as Random and Guided Patch Certificate Training, validated on MNIST and CIFAR-10 datasets.

Certified Defenses for Adversarial Patches

The paper "Certified Defenses for Adversarial Patches" tackles the increasingly significant problem of adversarial patch attacks on computer vision systems. Such attacks pose a practical threat by allowing an adversary to apply a localized perturbation, often in the form of conspicuous patches that can subvert image classifiers, object detectors, and facial recognition systems. Existing defenses, generally focused on whole-image perturbations, inadequately address the unique challenges of localized adversarial patches.

Key Contributions

Critique of Existing Defenses: The paper thoroughly evaluates pre-existing defense mechanisms, such as Local Gradient Smoothing (LGS) and Digital Watermarking (DW), revealing their vulnerabilities. Despite their claims of effectiveness, these defenses can be circumvented by sophisticated white-box adversaries. Employing techniques like Backward Pass Differential Approximation (BPDA), the authors demonstrate a dramatic drop in adversarial accuracy when the models are tested against stronger attacks.
Proposition of a Certified Defense: In response to the identified shortcomings, a novel certified defense is introduced. This defense builds upon interval bound propagation methods (IBP) to provide verifiable robustness against adversarial patches. By extending IBP to handle patch attacks and optimizing its application to neural networks through effective training strategies, the proposed method provides a lower bound on adversarial accuracy, referred to as certified accuracy.
Training Strategy Innovations: Addressing the challenge of computational inefficiency, the paper proposes several scalable training strategies. These include Random Patch and Guided Patch Certificate Training, designed to reduce overhead while maintaining robust performance. These methods significantly outperform traditional all-patch training by ensuring that computations only scale linearly with patch location, enabling practical application to larger models and datasets.
Experimental Validation and Insights: Comprehensive experiments on MNIST and CIFAR-10 datasets demonstrate the method's efficacy against both patch and sparse attacks. While the certified accuracy of existing defenses severely degrades under strong adversarial conditions, the proposed method maintains robust performance, even when generalized to different shapes and patch locations. Notably, the guided-patch approach shows potential by slightly outperforming random-patch training and suggesting additional avenues for optimization.
Transferability to Non-Square Patches: One remarkable finding is the robustness transferability of models trained with square patches to other geometric shapes, such as lines or diamonds, indicating that the certified defense extends beyond the configurations incorporated in the training phase.

Implications and Future Directions

Theoretical Contributions: The certified defense contributes to the theoretical understanding of model robustness, particularly against localized adversarial attacks. It offers insights into the operation of IBP in patch contexts, potentially influencing the design of future verification techniques.
Practical Impact on AI Systems: The proposed methodology could significantly influence the deployment of neural networks in real-world scenarios, enhancing the resiliency of systems against physical adversarial threats. With a verified lower bound on adversarial robustness, stakeholders could be more confident in deploying neural networks in safety-critical applications.
Scalability and Application: Future research may focus on improving the scalability of certified defenses to manage higher dimensions and more complex network architectures efficiently. Additionally, expanding this work to consider a broader range of adversarial models, including more varied spatial constraints and perturbation strategies, can further solidify its utility in diverse real-world applications.

In summary, the paper makes significant advancements in certified defenses for adversarial patches, addressing crucial gaps left by earlier approaches. Through robust theoretical and empirical analyses, it lays a foundation for the future development of secure and reliable computer vision models in adversarial contexts.

PDF Markdown

Related Papers

GitHub

GitHub - Ping-C/certifiedpatchdefense: Repository for Certified Defenses for Adversarial Patch ICLR-2020 (32 stars)