NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks (1905.00441v3)

Published 1 May 2019 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques. In this paper, we propose a black-box adversarial attack algorithm that can defeat both vanilla DNNs and those generated by various defense techniques developed recently. Instead of searching for an "optimal" adversarial example for a benign input to a targeted DNN, our algorithm finds a probability density distribution over a small region centered around the input, such that a sample drawn from this distribution is likely an adversarial example, without the need of accessing the DNN's internal layers or weights. Our approach is universal as it can successfully attack different neural networks by a single algorithm. It is also strong; according to the testing against 2 vanilla DNNs and 13 defended ones, it outperforms state-of-the-art black-box or white-box attack methods for most test cases. Additionally, our results reveal that adversarial training remains one of the best defense techniques, and the adversarial examples are not as transferable across defended DNNs as them across vanilla DNNs.

PDF Abstract

Analysis of NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks

The research paper titled "NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks" introduces an innovative approach to black-box adversarial attacks on deep neural networks (DNNs). The primary contribution of this paper is the proposal of a methodology for black-box attacks aimed at generating adversarial examples by determining a probability density distribution over a small region around an input, rather than finding a singular optimal adversarial instance. This approach is notable for its universality, capable of impairing both vanilla DNNs and those fortified by recent defensive methodologies.

Key Contributions

The method introduced, termed NATTACK, operates by leveraging a constrained natural evolution strategy (NES) to redefine adversarial attacks in black-box settings. Unlike conventional approaches that rely on access to the model's gradients or internal weights, NATTACK defines a statistically-driven framework that bypasses these requirements, positioning itself as a robust black-box method. The algorithm effectively creates a distribution from which samples likely to be adversarial can be drawn, thus maintaining efficacy against both vanilla and specially defended DNN architectures.

Empirical Insights

The paper documents extensive empirical evaluations of NATTACK, comparing its performance against existing state-of-the-art black-box and white-box adversarial attack techniques. Tested on two vanilla DNNs and thirteen defended models, NATTACK demonstrated superior success rates in most scenarios. Highlights from these comparisons underscore a 100% attack success rate on various defenses, which contrasts favorably against other contemporary methods. The paper also reveals the comparative strength of adversarial training as a defense technique and discerns that the transferability of adversarial examples is significantly less across defended DNNs compared to vanilla ones.

Methodological Advances

Furthermore, NATTACK enhances computational efficiency by initializing its algorithm through a parametric distribution, a feature advantageous over gradient-based techniques which operate directly in the high-dimensional input space. This reduction in dimensionality not only accelerates the attack process but also improves the plausibility of generating adversarial examples without extensive computational overhead. The initialization process is optimized using a regression neural network to provide a significant reduction in runtime, further corroborating the efficiency advantage NATTACK holds over other strategies.

Implications and Future Directions

The implications of NATTACK are substantial, providing a more generalized adversarial benchmark that may aid the future development of defense mechanisms against adversarial attacks. By providing a tool to circumvent obfuscated gradients and adopt a probabilistic model of attack, this paper paves the way for evolving the robustness of DNNs deployed in adversarially challenging environments. Future research could explore aggregation of diverse distribution families to enhance the modeling of adversarial populations, and refinement of adversarial training by drawing from these distributions, which may improve DNN robustness against adversarial intrusions.

This paper renders a compelling case for further exploration of the adversarial landscape, with NATTACK initiating a shift from deterministic adversarial crafting to probabilistic adversarial methodology in black-box settings. Such novel directions promise enhanced understanding and robustness in the face of adversarial challenges, marking a significant contribution to the field of deep learning security.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yandong Li (38 papers)
Lijun Li (30 papers)
Liqiang Wang (51 papers)
Tong Zhang (569 papers)
Boqing Gong (100 papers)

Citations (237)

View on Semantic Scholar