ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models (1708.03999v2)

Published 14 Aug 2017 in stat.ML, cs.CR, and cs.LG

Abstract: Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack and significantly outperforms existing black-box attacks via substitute models.

PDF Abstract

Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

The paper presents a novel approach for black-box attacks on Deep Neural Networks (DNNs) deploying zeroth order optimization (ZOO), diverging from the traditional method of using substitute models. This method innovatively utilizes zeroth order optimization to estimate gradients, thereby enabling the generation of adversarial examples without necessitating internal access to the target DNN, specifically avoiding backpropagation and model training.

Overview of the Paper

The authors introduce an attack strategy that purely utilizes the input-output behavior of the DNN to construct adversarial examples, circumventing the need for a substitute model. The ZOO-based attack directly estimates the gradient using zeroth order methods, which approximates the required gradient by finite differences, a technique often referred to as pseudo-backpropagation.

Attack Formulation and Techniques

The paper describes the attack using the framework of the Carlini-Wagner (C&W) attack but modifies the loss function to suit the black-box scenario. Specifically, the loss function is reformulated to depend solely on the output probabilities provided by the model, rather than the internal logits. The black-box attack involves the following key techniques:

Gradient Approximation: Using finite difference methods to estimate the gradient (and optionally, the Hessian) with respect to the input image.
Stochastic Coordinate Descent: Instead of updating the full gradient in one step, which is computationally expensive, the method updates one (or a small batch of) coordinate(s) at a time.
Attack-space Dimension Reduction: Utilizing a lower-dimensional space in the optimization process to make the attack computationally feasible.
Hierarchical Attacks: Gradually increasing the dimensionality of the attack space to refine the adversarial example progressively.
Importance Sampling: Prioritizing the update of more significant pixels based on their contributions to initial adversarial perturbations.

Performance Evaluation

The efficiency and effectiveness of the ZOO-based attacks were extensively evaluated on three different datasets: MNIST, CIFAR-10, and ImageNet, targeting the corresponding state-of-the-art DNN models. The results showed:

MNIST and CIFAR-10: On these datasets, ZOO achieved nearly 100% success rates in both targeted and untargeted attacks, with distortion metrics comparable to white-box C&W attacks. Notably, the proposed method significantly outperformed substitute model-based attacks in terms of both success rate and distortion.
ImageNet with Inception-v3: On this large-scale dataset, ZOO demonstrated a high success rate (approximately 90%) in untargeted attacks with limited computation time. For targeted attacks, even in challenging scenarios, ZOO successfully manipulated the DNN output with minimal perceptible distortions in the images.

Implications and Future Directions

The ZOO attack mechanism advances the state of adversarial attack methodologies by eliminating the need for substitute models, thereby simplifying the attack pipeline and reducing dependency on model-specific knowledge. This contributes to a more generalizable and practical attack strategy feasible for a variety of DNN architectures and applications.

From a theoretical perspective, the use of zeroth order optimization opens new avenues for adversarial research, particularly in understanding DNN vulnerabilities from an optimization lens. Practically, the techniques discussed can be leveraged to stress-test DNNs in real-world applications, ensuring robust defense mechanisms are in place.

Future research can explore several extensions:

Efficiency Improvements: Further optimizing the computational aspects of ZOO to handle even larger datasets and models more efficiently.
Adversarial Training: Integrating ZOO into adversarial training routines to enhance the robustness of deep learning models.
Cross-Domain Attacks: Applying the ZOO framework to other types of data and models, including those in natural language processing and time-series analysis, to verify its versatility.

In conclusion, the ZOO-based black-box attack method represents a significant step forward in adversarial machine learning, providing a robust, scalable, and effective means of generating adversarial examples without relying on substitute models. This approach will likely contribute substantially to both understanding and improving the robustness of DNNs against adversarial threats.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Pin-Yu Chen (311 papers)
Huan Zhang (171 papers)
Yash Sharma (45 papers)
Jinfeng Yi (61 papers)
Cho-Jui Hsieh (211 papers)

Citations (1,761)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos