Query-Efficient Black-box Adversarial Examples (superceded) (1712.07113v2)

Published 19 Dec 2017 in cs.CV, cs.LG, and stat.ML

Abstract: Note that this paper is superceded by "Black-Box Adversarial Attacks with Limited Queries and Information." Current neural network-based image classifiers are susceptible to adversarial examples, even in the black-box setting, where the attacker is limited to query access without access to gradients. Previous methods --- substitute networks and coordinate-based finite-difference methods --- are either unreliable or query-inefficient, making these methods impractical for certain problems. We introduce a new method for reliably generating adversarial examples under more restricted, practical black-box threat models. First, we apply natural evolution strategies to perform black-box attacks using two to three orders of magnitude fewer queries than previous methods. Second, we introduce a new algorithm to perform targeted adversarial attacks in the partial-information setting, where the attacker only has access to a limited number of target classes. Using these techniques, we successfully perform the first targeted adversarial attack against a commercially deployed machine learning system, the Google Cloud Vision API, in the partial information setting.

Citations (52)

View on Semantic Scholar

Summary

The paper introduces a novel method using Natural Evolutionary Strategies (NES) for generating query-efficient black-box adversarial examples against neural networks, significantly reducing the required queries compared to previous techniques.
It demonstrates successful adversarial attacks in a challenging partial-information setting where only limited outputs are available, culminating in the first targeted attack on the Google Cloud Vision API.
The NES approach achieves high success rates (e.g., 99.6% on CIFAR-10 with ~4,910 queries) and generates robust adversarial examples, highlighting the practical feasibility and speed of black-box attacks.

Query-efficient Black-box Adversarial Examples: A Technical Overview

The paper at hand examines the generation of adversarial examples in the restrictive black-box setting for neural network-based image classifiers. Within this context, the attacker can only access input-output relationships without obtaining gradient information. This research introduces a novel approach utilizing Natural Evolutionary Strategies (NES) to generate black-box adversarial examples, achieving query-efficiency far superior to extant methods. Additionally, it broaches the "partial-information setting" where attacks are conducted with access to only a limited set of class outputs, yielding significant implications for real-world applications.

Methodology and Contributions

The paper significantly enhances methodologies for adversarial sample generation by:

Introduction of NES for Black-box Attacks: By drawing an analogy between NES and finite difference methods, the paper presents an approach capable of estimating gradients without direct access, leveraging randomness in Gaussian vectors. This provides a theoretical underpinning that allows NES to produce adversarial examples with three orders of magnitude fewer queries than previous techniques.
Partial-information Setting Attacks: The paper delineates a method for generating targeted adversarial examples where only a subset of output classes is available, a challenge when engaging with commercial systems like Google Cloud Vision API. The focus here is using strategic means to pinpoint and exploit vulnerabilities even under significant query and information constraints.
Practical Application on Google Cloud Vision API: This research showcases the first targeted adversarial attack on the Google Cloud Vision API, a remarkable demonstration of efficacy in the real-world commercial ecosystem. Both untargeted and targeted attacks were successfully executed, underscoring the practical viability of the proposed approach.

Results and Implications

The results indicate that the NES-based approach attains a 99.6% success rate in generating adversarial examples against CIFAR-10 classifiers with an average of 4,910 queries, and a 99.2% success rate on ImageNet with 24,780 queries. Additionally, robust adversarial examples tolerant to transformations were generated, a first in black-box settings. This efficiency in query usage lowers the computational and temporal costs significantly, enhancing the feasibility of real-world black-box attacks.

The paper also explores the potential for applying Expectation over Transformation (EOT) to produce transformation-tolerant adversarial examples, indicating a promising direction for future research. This capability is crucial where adversarial perturbed images must maintain their adversarial status under varying conditions, such as different angles or lighting in practical deployments.

Future Directions

This work opens multiple avenues for future research. The theoretical implications of NES as a gradient estimate suggest that further exploration could refine these methods even further, improving both their efficiency and effectiveness. Additionally, the results suggest a need for defensive mechanisms in commercial systems to be robust against adversaries capable of functioning under strict query and information limitations. This line of research could extend into exploring how black-box adversarial examples manage under adaptive systems and contribute towards enhanced security measures in deployed neural networks.

In conclusion, while the paper advances current methods in generating adversarial examples, it also calls attention to the importance of evolving security strategies to counteract the increasingly sophisticated nature of potential adversarial threats in both theoretical research and industry applications.

Related Papers

YouTube

Show All Videos