Improving Black-box Adversarial Attacks with a Transfer-based Prior (1906.06919v3)

Published 17 Jun 2019 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.

Authors (5)

Shuyu Cheng (22 papers)
Yinpeng Dong (102 papers)
Tianyu Pang (96 papers)
Hang Su (224 papers)
Jun Zhu (424 papers)

Citations (255)

View on Semantic Scholar

Summary

The paper introduces the P-RGF method that leverages a transfer-based prior from a surrogate model, enhancing gradient estimation in black-box adversarial attacks.
It achieves marked improvements in query efficiency and success rates on models such as Inception-v3, VGG-16, and ResNet-50.
The research sets a new benchmark for efficient adversarial attacks in constrained settings while outlining future directions for adaptive learning models.

Improving Black-box Adversarial Attacks with a Transfer-based Prior

The paper "Improving Black-box Adversarial Attacks with a Transfer-based Prior" by Shuyu Cheng et al. presents a novel methodology to enhance the efficacy of black-box adversarial attacks through the integration of transfer-based priors. The paper addresses the challenges inherent in the black-box adversarial setting, where an adversary aims to create adversarial perturbations without direct access to the target model's gradients.

Methodology Overview

The authors introduce a prior-guided random gradient-free (P-RGF) method, which strategically leverages a transfer-based prior derived from a surrogate white-box model's gradient. This approach attempts to mitigate the issues of low success rates and high query costs associated with previous black-box attack strategies, which primarily relied on either gradient estimation through query feedback or gradient approximation using surrogate models.

P-RGF operates within a gradient estimation framework, aimed at optimally estimating the target model's gradient by utilizing randomly sampled vectors influenced by the transfer gradient. A key innovation of the method is the calculation of an optimal coefficient, which dictates the extent to which the transfer gradient informs the sampling process. Additionally, the method is flexible enough to incorporate alternative priors, such as data-dependent priors.

Strong Numerical Results and Claims

The empirical evaluations underscore P-RGF's competitive performance, as evidenced by substantial improvements in query efficiency and attack success rates over existing state-of-the-art methods. Specifically, P-RGF achieves superior results against a range of models, including Inception-v3, VGG-16, and ResNet-50, both in normally trained and defensively enhanced configurations. Notably, the method demonstrates a significantly higher success rate in achieving misclassification with fewer queries, evidencing the robustness and practicality of the proposed algorithm.

Implications and Future Directions

The integration of transfer-based priors in black-box adversarial attacks offers a compelling advancement in adversarial machine learning, potentially setting a new baseline for the development of more efficient black-box attack algorithms. Theoretical implications of the paper suggest new avenues for refining gradient approximation techniques in high-dimensional spaces under limited information constraints.

Practically, the research outlines a systematic approach to enhancing adversarial attacks against models where access to gradient information is restrictive, paving the way for future developments in security assessments of machine learning systems. Future research may explore the extension of this framework to more complex settings, such as those involving dynamic and adaptive learning models, as well as the integration of diverse and more sophisticated priors.

In conclusion, the "Improving Black-box Adversarial Attacks with a Transfer-based Prior" paper provides a substantive contribution to the adversarial attack domain, suggesting robust methodologies for enhancing attack performance under stringent constraints, with broad ramifications for the future of AI safety and robustness research.

PDF Markdown