Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack (1907.02044v2)

Published 3 Jul 2019 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in {1,2,\infty}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient masking.

Citations (445)

View on Semantic Scholar

Summary

The paper introduces the FAB attack that minimizes adversarial perturbations by projecting inputs onto adaptive decision boundaries.
It demonstrates versatility across multiple l_p norms and outperforms state-of-the-art methods with efficient step size and scaling invariance.
The approach effectively prevents gradient masking, offering robust benchmarks to advance model defense strategies.

An Analysis of the Fast Adaptive Boundary Attack for Minimally Distorted Adversarial Examples

The paper under review presents an innovative approach to generating adversarial examples with minimal distortion using a method referred to as the Fast Adaptive Boundary (FAB) Attack. The FAB attack stands out by its capability to efficiently find adversarial perturbations that are minimally distorted under the constraints of various $l_p$ -norms, namely $l_1$ , $l_2$ , and $l_\infty$ . It also minimizes the size of adversarial perturbations while circumventing the problem of gradient masking, a known challenge in adversarial machine learning.

Core Contributions

The FAB attack introduced in the paper embarks on several notable contributions:

White-box Adversarial Attack Across Multiple Norms: This method extends its applicability to $l_1$ , $l_2$ , and $l_\infty$ -norms, showcasing its versatility. Unlike many attacks that tailor to a specific norm, FAB maintains consistency in performance across these metrics.
Minimization of Adversarial Perturbations: A primary goal of the FAB attack is to find the least perturbation necessary to change the classification of an input while retaining a geometric interpretation that aids in understanding the decision boundary dynamics.
Resilience Against Gradient Masking: FAB shows robustness against the common pitfall of gradient masking, which leads many adversarial attacks to fail by providing false indicators of model robustness. This is achieved through a distinction between the geometric features over the mere reliance on gradient magnitudes.
Scaling and Step Size Invariance: The attack adds a significant edge by being invariant to classifier scaling and eliminating the need for specific step size parameters, which often necessitate careful tuning in methods like PGD.
Numerical Superiority: Through extensive experiments, FAB is shown to achieve better, or at least comparable, adverse results compared to state-of-the-art attacks in its class. The paper delivers empirical evidence of robust accuracy metrics that often outperform existing methods.

Methodological Strengths

The FAB attack draws from various methodological strengths:

Adaptive Linearization: By leveraging the Taylor expansion of the classifier at each iteration step, FAB can project inputs onto an adaptive decision boundary, allowing for a focused exploration of adversarial spaces with minimal distortion.
Projection Operator Efficiency: The efficiency of projecting onto a hyperplane is another technical strength, executed with computational efficiency. This facilitates a seamless intersection with box constraints in high dimensional spaces.
Iterative Biasing Towards Minimal Perturbation: The attack is designed with steps that bias projections towards the original point, ensuring perturbations remain minimal and evaluations quick to compute.

Implications and Future Directions

The implications of adopting the FAB attack extend beyond mere evaluation exercises:

Robustness Evaluation Enhancement: Its application is crucial in benchmarking machine learning models intended for critical applications, especially where safety and security are paramount.
Research into Defense Mechanisms: FAB's robust evaluation capability could aid in designing defenses that don't capitulate to gradient masking, pushing the frontier towards genuinely robust models.
Broader Applicability Across Norms: Future research may extend FAB's principles to other distance metrics, enhancing its applicability across diverse adversarial settings.
Scalability to Larger Datasets: Scalability remains a pertinent topic, particularly in handling large datasets like ImageNet. The targeted version of the attack proposed indicates a potential avenue for addressing computational concerns in large-class scenarios.

Ultimately, the FAB attack offers substantial advancements in the generation and evaluation of adversarial examples, providing a comprehensive method that is technically grounded and empirically validated. It hints at a future where attacks and defenses evolve towards handling adversarially secure models with increasing efficacy and transparency.

PDF Markdown

Related Papers

GitHub

GitHub - fra31/fab-attack: Code for FAB-attack (33 stars)