GeoDA: a geometric framework for black-box adversarial attacks (2003.06468v1)

Published 13 Mar 2020 in cs.CV, cs.CR, and cs.LG

Abstract: Adversarial examples are known as carefully perturbed images fooling image classifiers. We propose a geometric framework to generate adversarial examples in one of the most challenging black-box settings where the adversary can only generate a small number of queries, each of them returning the top-$1$ label of the classifier. Our framework is based on the observation that the decision boundary of deep networks usually has a small mean curvature in the vicinity of data samples. We propose an effective iterative algorithm to generate query-efficient black-box perturbations with small $\ell_p$ norms for $p \ge 1$, which is confirmed via experimental evaluations on state-of-the-art natural image classifiers. Moreover, for $p=2$, we theoretically show that our algorithm actually converges to the minimal $\ell_2$-perturbation when the curvature of the decision boundary is bounded. We also obtain the optimal distribution of the queries over the iterations of the algorithm. Finally, experimental results confirm that our principled black-box attack algorithm performs better than state-of-the-art algorithms as it generates smaller perturbations with a reduced number of queries.

Citations (103)

View on Semantic Scholar

Summary

The paper introduces GeoDA, a method that leverages low curvature decision boundaries to craft efficient adversarial examples in black-box settings.
It employs an iterative algorithm to estimate normal vectors, achieving minimal ℓ2-norm perturbations with theoretical convergence guarantees.
Empirical evaluation shows that GeoDA reduces query counts and outperforms state-of-the-art attacks while maintaining high success rates.

GeoDA: A Geometric Framework for Black-Box Adversarial Attacks

In the domain of machine learning and computer vision, adversarial robustness is a critical consideration, particularly for neural network-based classifiers. The paper "GeoDA: a geometric framework for black-box adversarial attacks" by Ali Rahmati et al. presents a novel method for generating adversarial examples specifically in the challenging black-box setting. This setting restricts the adversary's interactions with the model to a limited number of top-1 label queries without access to the model's parameters or gradients.

Core Methodology

The authors introduce a geometric framework named GeoDA (Geometric Decision-based Attack) that efficiently estimates and exploits the geometry of the decision boundary near data samples. The fundamental observation is that deep networks often have a decision boundary with low mean curvature in the vicinity of the samples, a property leveraged to design query-efficient adversarial attacks.

GeoDA is characterized by its:

Iterative Algorithm: The generation of adversarial perturbations relies on iteratively estimating the normal vector to the decision boundary - a challenging task in black-box settings. The framework linearizes the boundary locally, using a computed normal vector that guides the iterative optimization for crafting adversarial perturbations with minimal $\ell_p$ norms.
Convergence Guarantees: For $p=2$ , the convergence to minimal $\ell_2$ -norm perturbation is theoretically demonstrated. This is contingent on the bounded curvature of the decision boundary, ensuring that the obtained adversarial perturbations approximate optimal solutions as iterations progress.
Query Optimization: The authors derive the optimal query distribution over the iterations, making the most effective use of a limited query budget, which is a common constraint in realistic black-box scenarios.

Empirical Evaluation

Experimental results underline GeoDA's efficiency in generating smaller perturbations compared to state-of-the-art algorithms such as the Boundary Attack and HopSkipJump Attack. For instance, under a constrained query budget, GeoDA was able to consistently yield adversarial examples with a reduced number of queries while maintaining or exceeding the attack success rate. Visualizations of adversarial perturbations further demonstrate GeoDA's subtle yet impactful modifications to input data that lead to classifier errors.

Implications and Future Directions

GeoDA's methodology offers both practical and theoretical advancements in adversarial attack strategies under black-box access models. Its efficient query strategy can potentially be further refined or adapted across different machine learning applications, especially where threat models assume limited interaction capabilities with target systems.

This work lays the foundation for additional techniques that could optimize or build on geometric assumptions about decision boundaries, possibly leading to extensions that address:

Diverse Model Architectures: While the paper demonstrates performance on deep image classifiers, exploring robustness across varied architectures might yield insights into architectural vulnerability patterns.
Transferability of Perturbations: Leveraging transferability might lead to more effective attacks even under stricter query limitations, potentially inspiring defensive approaches.
Extended Norms and Constraints: Examining other norms or more complex constraints could refine the understanding of decision boundary behavior under alternative conditions.

Conclusion

GeoDA provides a powerful and theoretically sound approach to crafting adversarial examples in black-box settings, balancing efficiency with effectiveness. Its geometric perspective on decision boundaries offers a new lens through which adversarial attacks can be understood, evaluated, and ultimately, countered through improved defense mechanisms. As adversarial research advances, methodologies like GeoDA are crucial for developing robust, secure, and reliable machine learning systems.

PDF Markdown

Related Papers

YouTube

Show All Videos