Sign-OPT: A Query-Efficient Hard-label Adversarial Attack (1909.10773v3)

Published 24 Sep 2019 in cs.LG and stat.ML

Abstract: We study the most practical problem setup for evaluating adversarial robustness of a machine learning system with limited access: the hard-label black-box attack setting for generating adversarial examples, where limited model queries are allowed and only the decision is provided to a queried data input. Several algorithms have been proposed for this problem but they typically require huge amount (>20,000) of queries for attacking one example. Among them, one of the state-of-the-art approaches (Cheng et al., 2019) showed that hard-label attack can be modeled as an optimization problem where the objective function can be evaluated by binary search with additional model queries, thereby a zeroth order optimization algorithm can be applied. In this paper, we adopt the same optimization formulation but propose to directly estimate the sign of gradient at any direction instead of the gradient itself, which enjoys the benefit of single query. Using this single query oracle for retrieving sign of directional derivative, we develop a novel query-efficient Sign-OPT approach for hard-label black-box attack. We provide a convergence analysis of the new algorithm and conduct experiments on several models on MNIST, CIFAR-10 and ImageNet. We find that Sign-OPT attack consistently requires 5X to 10X fewer queries when compared to the current state-of-the-art approaches, and usually converges to an adversarial example with smaller perturbation.

Citations (205)

View on Semantic Scholar

Summary

The paper presents a novel hard-label adversarial attack that leverages single-query sign estimation of directional derivatives.
It demonstrates a drastic reduction in queries, requiring 5–10 times fewer queries than previous methods on datasets like MNIST, CIFAR-10, and ImageNet.
The findings highlight potential broader applications of sign-based optimization for model robustness evaluation and other zeroth-order optimization tasks.

Overview of "Sign-OPT: A Query-Efficient Hard-label Adversarial Attack"

The paper "Sign-OPT: A Query-Efficient Hard-label Adversarial Attack" presents a novel approach for constructing adversarial examples in the most challenging scenario of hard-label black-box attacks. This context involves adversarial attacks where attackers not only have limited query access to the model but also receive only decision outcomes without probabilities. The challenge here is to craft adversarial examples using minimal queries to a model without access to its internal gradients or probability outputs.

Contribution and Methodology

The authors critique existing algorithms for hard-label black-box attacks, which typically require an impractically large volume of queries, often exceeding 20,000 per instance. Previous state-of-the-art methods by Cheng et al. (2018) framed the problem as an optimization task, allowing each query's outcome to be fine-tuned using binary search to find adversarial perturbations. However, these methods remain query-intensive due to the need for full function evaluations through extensive binary searches.

The key innovation of Sign-OPT lies in bypassing full derivative estimation. Instead of estimating the gradient's magnitude, the approach involves estimating only the sign of a directional derivative. This sign estimation can remarkably be achieved with a single query, as opposed to the multiple queries required for magnitude estimation. The authors employ this innovative single-query directional derivative sign estimation to enhance query efficiency drastically.

Algorithmic Implementation and Results

The Sign-OPT methodology involves sampling directions and querying the model to ascertain whether the sign of change in output aligns with a decrease in the decision boundary distance. This sign is used to construct an update vector for optimizing attack steps. The theoretical analysis confirms that, despite leveraging non-standard gradient information, the algorithm exhibits convergence similar to zeroth-order methods. Experimental results demonstrate that Sign-OPT requires 5–10 times fewer queries compared to previous methods across various datasets, including MNIST, CIFAR-10, and ImageNet, generating adversarial examples with smaller perturbations.

Implications and Future Directions

Practically, this advancement presents a potent tool for evaluating model robustness against hard-label black-box attacks, offering significant efficiency gains. Theoretically, it underscores the potential of using alternative directional information, such as sign-oriented updates, in optimization tasks beyond adversarial attacks, possibly impacting broader domains of zeroth-order optimization.

In terms of future developments, this method opens pathways to explore and refine sign-based approaches in other forms of black-box optimization and adversarial settings. Additionally, further innovation could focus on integration with adaptive query budgeting methods or the exploration of hybrid information sources to enhance efficiency and robustness alignment with defenses.

The Sign-OPT approach demonstrates a meaningful stride in adversarial attack research, accentuating the importance of query efficiency in practical adversarial contexts, while its algorithmic principles may bear relevance in various AI-driven optimization scenarios.

PDF Markdown

Related Papers

GitHub

GitHub - cmhcbb/attackbox (60 stars)