Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models
The paper presents a novel approach for black-box attacks on Deep Neural Networks (DNNs) deploying zeroth order optimization (ZOO), diverging from the traditional method of using substitute models. This method innovatively utilizes zeroth order optimization to estimate gradients, thereby enabling the generation of adversarial examples without necessitating internal access to the target DNN, specifically avoiding backpropagation and model training.
Overview of the Paper
The authors introduce an attack strategy that purely utilizes the input-output behavior of the DNN to construct adversarial examples, circumventing the need for a substitute model. The ZOO-based attack directly estimates the gradient using zeroth order methods, which approximates the required gradient by finite differences, a technique often referred to as pseudo-backpropagation.
Attack Formulation and Techniques
The paper describes the attack using the framework of the Carlini-Wagner (C&W) attack but modifies the loss function to suit the black-box scenario. Specifically, the loss function is reformulated to depend solely on the output probabilities provided by the model, rather than the internal logits. The black-box attack involves the following key techniques:
- Gradient Approximation: Using finite difference methods to estimate the gradient (and optionally, the Hessian) with respect to the input image.
- Stochastic Coordinate Descent: Instead of updating the full gradient in one step, which is computationally expensive, the method updates one (or a small batch of) coordinate(s) at a time.
- Attack-space Dimension Reduction: Utilizing a lower-dimensional space in the optimization process to make the attack computationally feasible.
- Hierarchical Attacks: Gradually increasing the dimensionality of the attack space to refine the adversarial example progressively.
- Importance Sampling: Prioritizing the update of more significant pixels based on their contributions to initial adversarial perturbations.
Performance Evaluation
The efficiency and effectiveness of the ZOO-based attacks were extensively evaluated on three different datasets: MNIST, CIFAR-10, and ImageNet, targeting the corresponding state-of-the-art DNN models. The results showed:
- MNIST and CIFAR-10: On these datasets, ZOO achieved nearly 100% success rates in both targeted and untargeted attacks, with distortion metrics comparable to white-box C&W attacks. Notably, the proposed method significantly outperformed substitute model-based attacks in terms of both success rate and distortion.
- ImageNet with Inception-v3: On this large-scale dataset, ZOO demonstrated a high success rate (approximately 90%) in untargeted attacks with limited computation time. For targeted attacks, even in challenging scenarios, ZOO successfully manipulated the DNN output with minimal perceptible distortions in the images.
Implications and Future Directions
The ZOO attack mechanism advances the state of adversarial attack methodologies by eliminating the need for substitute models, thereby simplifying the attack pipeline and reducing dependency on model-specific knowledge. This contributes to a more generalizable and practical attack strategy feasible for a variety of DNN architectures and applications.
From a theoretical perspective, the use of zeroth order optimization opens new avenues for adversarial research, particularly in understanding DNN vulnerabilities from an optimization lens. Practically, the techniques discussed can be leveraged to stress-test DNNs in real-world applications, ensuring robust defense mechanisms are in place.
Future research can explore several extensions:
- Efficiency Improvements: Further optimizing the computational aspects of ZOO to handle even larger datasets and models more efficiently.
- Adversarial Training: Integrating ZOO into adversarial training routines to enhance the robustness of deep learning models.
- Cross-Domain Attacks: Applying the ZOO framework to other types of data and models, including those in natural language processing and time-series analysis, to verify its versatility.
In conclusion, the ZOO-based black-box attack method represents a significant step forward in adversarial machine learning, providing a robust, scalable, and effective means of generating adversarial examples without relying on substitute models. This approach will likely contribute substantially to both understanding and improving the robustness of DNNs against adversarial threats.