Defending Against Physically Realizable Attacks on Image Classification (1909.09552v2)

Published 20 Sep 2019 in cs.LG, cs.AI, cs.CV, eess.IV, and stat.ML

Abstract: We study the problem of defending deep neural network approaches for image classification from physically realizable attacks. First, we demonstrate that the two most scalable and effective methods for learning robust models, adversarial training with PGD attacks and randomized smoothing, exhibit very limited effectiveness against three of the highest profile physical attacks. Next, we propose a new abstract adversarial model, rectangular occlusion attacks, in which an adversary places a small adversarially crafted rectangle in an image, and develop two approaches for efficiently computing the resulting adversarial examples. Finally, we demonstrate that adversarial training using our new attack yields image classification models that exhibit high robustness against the physically realizable attacks we study, offering the first effective generic defense against such attacks.

Citations (117)

View on Semantic Scholar

Summary

Defending Against Physically Realizable Attacks on Image Classification: A Technical Analysis

In the paper "Defending Against Physically Realizable Attacks on Image Classification," the authors address the critical problem of enhancing the robustness of deep neural networks against physical attacks on image classification systems. The paper's central focus is on developing effective defensive mechanisms against physical adversarial attacks, which pose significant risks to neural network-based image classifiers operating in real-world scenarios.

Empirical Evaluation of Conventional Approaches

The authors begin by evaluating the existing defenses against adversarial attacks: adversarial training with Projected Gradient Descent (PGD) and randomized smoothing. These techniques have shown effectiveness against digital $l_\infty$ and $l_2$ attacks. However, their efficacy diminishes notably against physically realizable attacks, such as adversarial eyeglass frames designed to fool facial recognition systems, and sticker manipulations on stop signs to mislead traffic sign classifiers. Results indicate that these conventional approaches provide limited defense, achieving robust accuracy significantly below acceptable thresholds when faced with strong physical attacks.

Introduction of Rectangular Occlusion Attacks (ROA)

To overcome the limitations of existing models, the authors propose a novel threat model known as Rectangular Occlusion Attacks (ROA). This model simulates real-world physical attacks by allowing an adversary to place adversarially-crafted rectangles onto images, manipulating both their location and internal content with high $l_\infty$ norm perturbations. The model serves as a generic abstraction that captures the essence of physically realizable attacks, offering a more realistic adversarial framework compared to traditional pixel-based perturbations.

Defense Strategy: Adversarial Training Using ROA

The paper introduces Defense against Occlusion Attacks (DOA), where adversarial training is specifically tailored to address the ROA threat model. By incorporating DOA, the authors demonstrate significant improvements in robustness against physical attacks. DOA-trained models exhibit high accuracy against the studied eyeglass frame and stop sign attacks, and further extend this defense to adversarial patch attacks. This approach leverages the ROA model’s flexibility to adapt to diverse physical attack patterns, ensuring robustness across various attack scenarios.

Numerical Findings and Implications

Through comprehensive experimental evaluations, the paper presents robust accuracy scores against physical attacks with DOA methods, often surpassing 90% in scenarios where conventional techniques falter. The authors carefully adjust the DOA configurations, such as rectangle size and number of PGD iterations, to optimize robust performance. These findings suggest that the proposed ROA-based defense strategy significantly enhances the reliability of neural network classifiers in practical applications where physical adversarial attacks are prevalent.

Future Directions

The implications of this research are profound, suggesting that adopting occlusion-based adversarial training could be pivotal for the security of neural networks deployed in real-world contexts, such as autonomous vehicles and surveillance systems. Additionally, the paper opens avenues for exploring certification of robustness against ROA, and for investigating alternative occlusion forms that may further strengthen defenses against sophisticated physical attacks.

In conclusion, the paper offers valuable insights and methodologies for defending against physically realizable attacks, contributing to the foundational advancements in robust machine learning systems. The introduction of ROA and DOA demonstrates a promising shift from digital to practical adversarial models, paving the way for more secure and reliable applications of neural networks in various industries.

Related Papers

GitHub

GitHub - tongwu2020/phattacks: Defending Against Physically Realizable Attacks on Image Classification (34 stars)