Admix: Enhancing the Transferability of Adversarial Attacks (2102.00436v3)

Published 31 Jan 2021 in cs.CV and cs.CR

Abstract: Deep neural networks are known to be extremely vulnerable to adversarial examples under white-box setting. Moreover, the malicious adversaries crafted on the surrogate (source) model often exhibit black-box transferability on other models with the same learning task but having different architectures. Recently, various methods are proposed to boost the adversarial transferability, among which the input transformation is one of the most effective approaches. We investigate in this direction and observe that existing transformations are all applied on a single image, which might limit the adversarial transferability. To this end, we propose a new input transformation based attack method called Admix that considers the input image and a set of images randomly sampled from other categories. Instead of directly calculating the gradient on the original input, Admix calculates the gradient on the input image admixed with a small portion of each add-in image while using the original label of the input to craft more transferable adversaries. Empirical evaluations on standard ImageNet dataset demonstrate that Admix could achieve significantly better transferability than existing input transformation methods under both single model setting and ensemble-model setting. By incorporating with existing input transformations, our method could further improve the transferability and outperforms the state-of-the-art combination of input transformations by a clear margin when attacking nine advanced defense models under ensemble-model setting. Code is available at https://github.com/JHL-HUST/Admix.

Authors (4)

Xiaosen Wang (30 papers)
Xuanran He (2 papers)
Jingdong Wang (236 papers)
Kun He (177 papers)

Citations (171)

View on Semantic Scholar

Summary

Enhancing Adversarial Attack Transferability with Admix

The paper "Admix: Enhancing the Transferability of Adversarial Attacks" introduces a novel technique aimed at improving the transferability of adversarial examples across different deep neural network (DNN) architectures. Transferability is a crucial aspect of adversarial attacks in the black-box setting, where the attacker has no knowledge of the target model. This research contributes to the existing body of knowledge by presenting a new input transformation method termed Admix, which significantly enhances adversarial transferability.

Background and Motivation

Adversarial examples are subtle perturbations added to inputs that induce incorrect classifications by deep neural networks. Despite their effectiveness in white-box settings—where the model's internal parameters and gradients are known—these attacks lack transferability, making them less effective in realistic, black-box scenarios. Previous works have addressed this challenge using techniques such as momentum-based methods, ensemble attacks, and input transformations. The latter involves modifying the input data in certain ways before feeding it into the model to mislead the model more effectively.

Input transformations, including methods like resizing, padding, and scaling, enhance transferability to some extent. However, these techniques traditionally operate on a single input image, possibly limiting their potential. The introduction of Admix aims to overcome this limitation by leveraging information from images of other categories.

Methodology

Admix operates by admixing the original input image with minor contributions from images randomly sampled from different categories. Specifically, it adds a small portion of each auxiliary image to the input image while maintaining the label of the original image unchanged. This process effectively diversifies the inputs used for gradient calculation, promoting adversarial transferability.

The proposed method involves the following steps:

Select a set of images from other categories.
Admix each of these images with the original input by adding a small proportion of the auxiliary image.
Calculate gradients based on these admixed inputs without altering the original label.

The Admix method integrates seamlessly with existing attack strategies, including the Fast Gradient Sign Method (FGSM), its iterative variant (I-FGSM), and Momentum Iterative Fast Gradient Sign Method (MI-FGSM). Moreover, it can enhance the performance of other input transformation techniques when combined, further boosting adversarial success rates.

Experimental Results

Empirical evaluations on the ImageNet dataset demonstrate Admix's efficacy in both single-model and ensemble-model settings. When compared to state-of-the-art input transformations like Diverse Input Method (DIM), Translation-Invariant Method (TIM), and Scale-Invariant Method (SIM), Admix consistently outperforms these techniques, achieving notably higher attack success rates across various models, including robust adversarially trained networks.

For instance, when attacking models such as Inception-v3, Inception-v4, and InceptionResNet-v2, Admix achieves higher transferability than previous methods, outperforming competitors by a clear margin. Experimental results show that combining Admix with other techniques like DIM and TIM results in further improvements. In ensemble attacks, Admix enhances adversarial efficacy even against advanced defense models, underscoring the robustness and effectiveness of this approach.

Implications and Future Directions

The introduction of Admix represents a significant step towards crafting more robust adversarial examples with enhanced transferability. From a practical perspective, this has implications for the development of secure machine learning models that are resilient to adversarial attacks. The adaptability of Admix also allows it to be integrated with existing methodologies, potentially leading to the development of even more potent adversarial strategies.

Future work could explore the application of Admix in domains beyond computer vision, such as natural language processing and reinforcement learning. Additionally, investigating its implications for adversarial training and defense mechanisms can contribute to a more comprehensive understanding of adversarial robustness in DNNs. As DNNs are applied in increasingly diverse and critical applications, understanding and mitigating the risks presented by adversarial attacks remain a paramount concern in the field of artificial intelligence.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - JHL-HUST/Admix (33 stars)