Feature Importance-aware Transferable Adversarial Attacks (2107.14185v3)

Published 29 Jul 2021 in cs.CV

Abstract: Transferability of adversarial examples is of central importance for attacking an unknown model, which facilitates adversarial attacks in more practical scenarios, e.g., black-box attacks. Existing transferable attacks tend to craft adversarial examples by indiscriminately distorting features to degrade prediction accuracy in a source model without aware of intrinsic features of objects in the images. We argue that such brute-force degradation would introduce model-specific local optimum into adversarial examples, thus limiting the transferability. By contrast, we propose the Feature Importance-aware Attack (FIA), which disrupts important object-aware features that dominate model decisions consistently. More specifically, we obtain feature importance by introducing the aggregate gradient, which averages the gradients with respect to feature maps of the source model, computed on a batch of random transforms of the original clean image. The gradients will be highly correlated to objects of interest, and such correlation presents invariance across different models. Besides, the random transforms will preserve intrinsic features of objects and suppress model-specific information. Finally, the feature importance guides to search for adversarial examples towards disrupting critical features, achieving stronger transferability. Extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed FIA, i.e., improving the success rate by 9.5% against normally trained models and 12.8% against defense models as compared to the state-of-the-art transferable attacks. Code is available at: https://github.com/hcguoO0/FIA

Citations (183)

View on Semantic Scholar

Summary

The paper introduces a novel FIA method that significantly improves adversarial example transferability by targeting object-aware features.
FIA employs an aggregate gradient approach to identify and perturb invariant features critical for model decisions.
Empirical tests show a 9.5% to 12.8% increase in attack success rates, highlighting its value for robust model evaluation.

Feature Importance-aware Transferable Adversarial Attacks

The paper "Feature Importance-aware Transferable Adversarial Attacks" addresses a critical challenge in the field of adversarial machine learning: enhancing the transferability of adversarial examples. These adversarial examples are slight modifications to inputs that can mislead deep neural networks (DNNs), and their transferability is essential when attacking models without direct access, often referred to as black-box attacks.

The main contribution of this paper is the Feature Importance-aware Attack (FIA), a novel approach that significantly improves the transferability of adversarial examples. Traditional methods for creating adversarial examples often rely on indiscriminately perturbing input features to decrease model accuracy on a source model. This can result in adversarial examples that are highly specific to the model for which they were generated, thereby limiting their effectiveness when used against other models. The authors argue that such methods can lead to model-specific local optima, which impairs transferability.

FIA takes a different approach by focusing on disrupting critical object-aware features that are consistent across multiple models. Instead of treating all features equally, FIA leverages a concept termed the "aggregate gradient" to identify and perturb important features that significantly influence model decisions. The aggregate gradient is calculated by averaging gradients with respect to feature maps across random transformations of the original image. This process is designed to emphasize object-related features that are invariant across different models while minimizing the impact of model-specific features.

The empirical results presented in the paper demonstrate the effectiveness of the FIA method. On average, FIA improved the success rate by 9.5% against normally trained models and 12.8% against defense models compared to current state-of-the-art approaches. These results were achieved across a diverse set of classification models, including Inception-V3, ResNet-50, and VGG-16, both in their standard and adversarially trained forms.

The potential implications of this research are substantial. On a practical level, the enhanced transferability of adversarial examples could be employed in evaluating and strengthening the robustness of deep learning models, especially in security-critical applications like autonomous driving and medical imaging. On a theoretical level, understanding why certain features are more transferable could provide insights into the internal dynamics of neural networks and the nature of adversarial vulnerabilities.

The authors also discuss speculative future developments in AI that might benefit from this research, such as the design of more robust model training regimes that inherently account for feature importance. Moreover, integrating FIA with other adversarial strategies, such as momentum-based or diversity-inducing methods, could yield even more robust transferability.

In conclusion, the Feature Importance-aware Transferable Adversarial Attacks paper presents a focused and empirically validated approach to enhancing the transferability of adversarial examples. By aligning the perturbation process with the intrinsic object-aware features shared across different models, the FIA method positions itself as a valuable tool for probing and improving model robustness in deep learning.

Related Papers

GitHub

GitHub - hcguoO0/FIA: code for "Feature Importance-aware Transferable Adversarial Attacks" (78 stars)