Query-Efficient Hard-label Black-box Attack:An Optimization-based Approach (1807.04457v1)

Published 12 Jul 2018 in cs.LG, cs.AI, and stat.ML

Abstract: We study the problem of attacking a machine learning model in the hard-label black-box setting, where no model information is revealed except that the attacker can make queries to probe the corresponding hard-label decisions. This is a very challenging problem since the direct extension of state-of-the-art white-box attacks (e.g., CW or PGD) to the hard-label black-box setting will require minimizing a non-continuous step function, which is combinatorial and cannot be solved by a gradient-based optimizer. The only current approach is based on random walk on the boundary, which requires lots of queries and lacks convergence guarantees. We propose a novel way to formulate the hard-label black-box attack as a real-valued optimization problem which is usually continuous and can be solved by any zeroth order optimization algorithm. For example, using the Randomized Gradient-Free method, we are able to bound the number of iterations needed for our algorithm to achieve stationary points. We demonstrate that our proposed method outperforms the previous random walk approach to attacking convolutional neural networks on MNIST, CIFAR, and ImageNet datasets. More interestingly, we show that the proposed algorithm can also be used to attack other discrete and non-continuous machine learning models, such as Gradient Boosting Decision Trees (GBDT).

Citations (332)

View on Semantic Scholar

Summary

The paper reformulates hard-label attacks as a real-valued optimization problem, enabling effective use of zeroth order optimization methods.
The approach reduces query counts by up to three times compared to traditional decision-based attacks on datasets like MNIST, CIFAR-10, and ImageNet.
The method extends to non-differentiable models such as Gradient Boosting Decision Trees, offering new insights into model vulnerability and security.

Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach

The field of adversarial attacks on machine learning models has progressed significantly, with this paper by Cheng et al. addressing a notably challenging scenario—hard-label black-box attacks. In this setting, the attacker has no access to model information beyond the label decision from input queries. The hard-label black-box context complicates traditional attack strategies because the gradient information typically used in optimization is unavailable, necessitating innovative solutions to compromise model integrity efficiently.

Core Contributions

Cheng et al. propose a novel reformulation of the hard-label attack problem by translating it into a real-valued optimization problem. Unlike prior approaches that involved random walks on decision boundaries, which entail heavy computational costs and no convergence guarantees, this new formulation allows the use of zeroth order optimization methods, leveraging the Randomized Gradient-Free (RGF) technique. This approach frames the attack as the minimization of the distance to a decision boundary, facilitating the use of continuous and differentiable objectives in attacks.

The authors articulate the weaknesses of existing methods, emphasizing the combinatorial nature of discrete steps in hard-label settings, which prohibitively increases the search space and query complexity. Their approach innovatively circumvents these issues by estimating the gradient using function evaluations along Gaussian-sampled directions, yielding an efficient method to find adversarial examples.

Experimental Insights

Cheng et al. demonstrate the effectiveness of their approach on convolutional neural networks trained on MNIST, CIFAR-10, and ImageNet datasets. The results highlight considerable reductions in query numbers in comparison to the existing decision-based attack methods. Specifically, on average, the proposed method reduces query counts by up to three times while achieving comparable or even improved adversarial example success.

Interestingly, the paper extends the applicability of their method beyond neural networks to other non-continuous models, such as Gradient Boosting Decision Trees (GBDT). This exploration is particularly noteworthy as tree-based models, owing to their non-differentiable nature, have posed formidable challenges to gradient-dependent attack strategies. Cheng et al.’s method poses a significant advancement in understanding and testing the robustness of such models against adversarial perturbations.

Theoretical Underpinnings

The authors provide theoretical analysis supporting the convergence of their algorithm. By bounding the error of function value evaluations and using carefully controlled numerical accuracy, they assert convergence to stationary points, given a sufficiently smooth decision boundary. Their work bridges a critical gap in theoretical guarantees for adversarial attack efficiency in hard-label contexts.

Implications and Future Work

This research has noteworthy implications for the security of machine learning systems, indicating vulnerabilities in scenarios where hard-label information is presumed secure due to its limited disclosure. Practically, the methods outlined could inform robust defense strategies by preemptively identifying potential weaknesses using hard-label attack simulation.

Moving forward, potential areas of exploration may involve further refinement of query efficiency and exploration of other non-gradient techniques that might enhance the attack's stealth and efficiency. Investigating applications on larger, more diverse model architectures, and extending methodologies to encompass adaptive and real-time adversarial scenarios would align with the evolving landscape of adversarial machine learning. Additionally, the implications of such advanced attacks on model interpretability and fairness implications warrant rigorous evaluation.

PDF Markdown