Distributionally Adversarial Attack (1808.05537v3)

Published 16 Aug 2018 in cs.LG, cs.CR, and stat.ML

Abstract: Recent work on adversarial attack has shown that Projected Gradient Descent (PGD) Adversary is a universal first-order adversary, and the classifier adversarially trained by PGD is robust against a wide range of first-order attacks. It is worth noting that the original objective of an attack/defense model relies on a data distribution $p(\mathbf{x})$, typically in the form of risk maximization/minimization, e.g., $\max/\min\mathbb{E}{p(\mathbf(x))}\mathcal{L}(\mathbf{x})$ with $p(\mathbf{x})$ some unknown data distribution and $\mathcal{L}(\cdot)$ a loss function. However, since PGD generates attack samples independently for each data sample based on $\mathcal{L}(\cdot)$, the procedure does not necessarily lead to good generalization in terms of risk optimization. In this paper, we achieve the goal by proposing distributionally adversarial attack (DAA), a framework to solve an optimal {\em adversarial-data distribution}, a perturbed distribution that satisfies the $L\infty$ constraint but deviates from the original data distribution to increase the generalization risk maximally. Algorithmically, DAA performs optimization on the space of potential data distributions, which introduces direct dependency between all data points when generating adversarial samples. DAA is evaluated by attacking state-of-the-art defense models, including the adversarially-trained models provided by {\em MIT MadryLab}. Notably, DAA ranks {\em the first place} on MadryLab's white-box leaderboards, reducing the accuracy of their secret MNIST model to $88.79\%$ (with $l_\infty$ perturbations of $\epsilon = 0.3$) and the accuracy of their secret CIFAR model to $44.71\%$ (with $l_\infty$ perturbations of $\epsilon = 8.0$). Code for the experiments is released on \url{https://github.com/tianzheng4/Distributionally-Adversarial-Attack}.

Citations (120)

View on Semantic Scholar

Summary

The paper introduces a novel framework that leverages Wasserstein Gradient Flows to optimize adversarial data distributions for increased generalization risk.
It proposes two optimization methods, the Lagrangian blob method and the discrete-gradient-flow method, to efficiently generate adversarial examples.
Empirical results demonstrate that the framework significantly degrades the accuracy of robustly trained models on both MNIST and ImageNet datasets.

An Examination of Distributionally Adversarial Attack Framework

The paper "Distributionally Adversarial Attack" by Tianhang Zheng, Changyou Chen, and Kui Ren presents a novel framework for adversarial attacks in machine learning, focusing on the generation and optimization of adversarial samples. The paper introduces the concept of distributionally adversarial attack (DAA), aiming to leverage a distributional approach to generate adversarial examples that can substantially reduce the generalization accuracy of robustly trained models.

Overview of Key Contributions

The central thesis of the paper is that standard adversarial sample generation techniques—such as Projected Gradient Descent (PGD)—may not optimize the data distribution efficiently to achieve maximal generalization risk increase. The proposed DAA approach seeks to address this by operating in the space of data distributions, thus enabling the generation of adversarial samples with an increased association to generalization risk.

The authors conceptualized DAA using the concept of Wasserstein Gradient Flows (WGFs), and detailed the use of energy functionals to optimize adversarial-data distributions. Two different optimization methods are explored for this purpose: the Lagrangian blob method and the discrete-gradient-flow method, each aiming to evolve the distribution of adversarial samples while maintaining computational feasibility. The framework's effectiveness is demonstrated through its application to state-of-the-art defense models, where DAA consistently outperforms PGD in degrading model accuracy under adversarial conditions.

Numerical Results and Claims

The paper places significant emphasis on the empirical validation of the proposed methods. Strong numerical results are highlighted, particularly with respect to adversarial attacks on models trained using robust techniques such as PGD-adversarial training. For instance, DAA-DGF and DAA-BLOB were able to reduce the accuracy of the MadryLab's secret MNIST model to 88.79% using $l_\infty$ perturbations of $\epsilon = 0.3$ , marking an apparent enhancement over existing attack methods. Additionally, when applied to the ensemble adversarial trained ImageNet model, DAA disrupted the model accuracy to 16.43% with modest $l_\infty$ perturbations.

Implications and Future Directions

The paper's contributions have practical implications for understanding and improving adversarial attack methodologies, particularly in the context of robust model training. The theoretical exposition provides a new perspective on adversarial attacks through the lens of distribution optimization, encouraging further exploration in this area.

Future developments in AI could see extensions of this framework, potentially involving higher-order information for more nuanced adversarial attacks. Additionally, consideration of alternative perturbation metrics, such as $l_1$ or $l_2$ , may inspire new directions for mitigating the robustness of adversarially-trained models.

In conclusion, the Distributionally Adversarial Attack framework proposed by Zheng, Chen, and Ren represents an important step forward in the field of adversarial machine learning. By emphasizing the optimization of data distributions, this paper opens avenues for more effective adversarial attack strategies and contributes to the broader discourse on AI security and model robustness.

PDF Markdown

Related Papers

GitHub

GitHub - tianzheng4/Distributionally-Adversarial-Attack: AAAI 2019 oral presentation (50 stars)