On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses (1804.03286v1)

Published 10 Apr 2018 in cs.CV, cs.CR, cs.LG, and stat.ML

Abstract: Neural networks are known to be vulnerable to adversarial examples. In this note, we evaluate the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, we can reduce the accuracy of the defended models to 0%.

Citations (165)

View on Semantic Scholar

Summary

The paper demonstrates that both Pixel Deflection and HGD collapse under white-box attacks, reducing accuracy to 0% on ImageNet with ℓ∞ perturbations.
It employs BPDA and PGD methods to reveal defense weaknesses, achieving near-total success for targeted adversarial examples.
The study challenges traditional threat models and highlights the urgent need for more robust adversarial defenses in neural networks.

An Evaluation of Adversarial Example Defenses from CVPR 2018

The paper "On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses" by Anish Athalye and Nicholas Carlini provides a critical assessment of two adversarial defense mechanisms presented at CVPR 2018: Pixel Deflection and High-Level Representation Guided Denoiser (HGD). The authors focus on examining the robustness of these defenses against white-box adversarial attacks, which assume full knowledge of the model architecture and parameters.

Overview

Adversarial examples pose a significant challenge in the domain of neural networks, particularly in image classification tasks. These examples are slight perturbations of legitimate inputs, crafted to deceive models into making incorrect predictions. The paper scrutinizes white-box defenses intended to mitigate such vulnerabilities, demonstrating that these approaches fail to provide robust security under stringent adversarial conditions.

The evaluation centers on the robustness of defenses when faced with adversarial examples that are bounded by an $\ell_\infty$ perturbation of $4/255$, a more rigorous constraint than those originally considered by the defense authors. The authors conducted their analysis on the ImageNet dataset, showing that both defenses could be completely compromised, resulting in a model accuracy of 0% when attacked.

Detailed Analysis

Pixel Deflection

Pixel Deflection is a defense strategy that applies a non-differentiable preprocessing technique to input images. It substitutes some image pixels with nearby ones, introducing noise that is then corrected through a denoising operation. Despite this mechanism, the paper demonstrates that using BPDA (Backward Pass Differentiable Approximation), the defense is breached with an accuracy reduction to 0%. Targeted attacks achieve a 97% success rate, clearly indicating the vulnerability of this defense in real-world scenarios.

High-Level Representation Guided Denoiser

This defense involves denoising inputs using a pre-trained neural network before classification. The approach is entirely differentiable and non-randomized. The authors applied Projected Gradient Descent (PGD) without modifications to attack the defense and reported a total drop in classifier accuracy to 0%. The defense was not only completely ineffective against non-targeted attacks but also yielded a 100% success rate for targeted adversarial examples, further exemplifying inadequacies in offering practical security.

Implications and Future Work

The findings in this paper present significant implications for the field of adversarial machine learning. They suggest that the standard threat models often employed in evaluating adversarial defenses may be insufficient, particularly the "oblivious attacker" model, which assumes the attacker is unaware of the defense. Instead, the analysis under a white-box threat model offers more realistic and stringent security assessments.

Moving forward, the research underscores the necessity for developing more robust defenses that can withstand knowledgeable adversaries. Future work could explore novel strategies that either inherently incorporate adversarial robustness into neural network design or deploy adaptive mechanisms that potentially confuse or deter adaptive adversaries. With the continuous evolution of adversarial attack techniques, the advancement of resilient defense capabilities remains a preeminent priority within the machine learning community.

In conclusion, the paper by Athalye and Carlini exemplifies a thorough evaluation of purportedly robust adversarial defenses, extending a crucial discourse on the efficacy of existing models and setting a foundational direction for future explorations in the quest for truly secure and reliable AI systems.

PDF Markdown