Improving Transferability of Adversarial Examples with Input Diversity (1803.06978v4)

Published 19 Mar 2018 in cs.CV, cs.LG, and stat.ML

Abstract: Though CNNs have achieved the state-of-the-art performance on various vision tasks, they are vulnerable to adversarial examples --- crafted by adding human-imperceptible perturbations to clean images. However, most of the existing adversarial attacks only achieve relatively low success rates under the challenging black-box setting, where the attackers have no knowledge of the model structure and parameters. To this end, we propose to improve the transferability of adversarial examples by creating diverse input patterns. Instead of only using the original images to generate adversarial examples, our method applies random transformations to the input images at each iteration. Extensive experiments on ImageNet show that the proposed attack method can generate adversarial examples that transfer much better to different networks than existing baselines. By evaluating our method against top defense solutions and official baselines from NIPS 2017 adversarial competition, the enhanced attack reaches an average success rate of 73.0%, which outperforms the top-1 attack submission in the NIPS competition by a large margin of 6.6%. We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future. Code is available at https://github.com/cihangxie/DI-2-FGSM.

Authors (7)

Cihang Xie (91 papers)
Zhishuai Zhang (27 papers)
Yuyin Zhou (92 papers)
Song Bai (87 papers)
Jianyu Wang (84 papers)
Zhou Ren (17 papers)
Alan Yuille (294 papers)

Citations (1,008)

View on Semantic Scholar

Summary

Improving Transferability of Adversarial Examples with Input Diversity

The paper "Improving Transferability of Adversarial Examples with Input Diversity" addresses the vulnerability of Convolutional Neural Networks (CNNs) to adversarial examples, particularly under the challenging black-box setting. The authors propose a novel method to enhance the transferability of such adversarial examples by utilizing input diversity during the attack process.

Overview

Adversarial attacks exploit small, human-imperceptible perturbations to input images that can lead CNNs to make incorrect predictions. These attacks typically fall into two categories: single-step attacks, such as the Fast Gradient Sign Method (FGSM), and iterative attacks like the Iterative Fast Gradient Sign Method (I-FGSM). While iterative attacks tend to achieve higher success rates in the white-box setting due to their ability to exploit network-specific gradients, they often overfit to the attacked model, resulting in lower transferability to other networks in the black-box setting. Conversely, single-step attacks, although less successful in white-box scenarios, exhibit better transferability due to their underfitting nature.

This paper introduces the Diverse Inputs Iterative Fast Gradient Sign Method (DI\textsuperscript{2}-FGSM), which integrates input diversity into the iterative attack process to balance this trade-off. Specifically, DI\textsuperscript{2}-FGSM applies random transformations, such as resizing and padding, to the input images at each attack iteration, thereby generating more transferable adversarial examples.

Experimental Results

Extensive experiments on the ImageNet dataset demonstrate that the proposed DI\textsuperscript{2}-FGSM significantly improves black-box attack success rates while maintaining high white-box success rates. For instance, DI\textsuperscript{2}-FGSM achieved an average black-box success rate improvement over I-FGSM across different network architectures, highlighting its effectiveness.

Moreover, by incorporating a momentum term, the Momentum Diverse Inputs Iterative Fast Gradient Sign Method (M-DI\textsuperscript{2}-FGSM) is proposed, which combines the input diversity strategy with the momentum term from MI-FGSM. This further enhances the transferability, as evidenced by the experiments. For example, M-DI\textsuperscript{2}-FGSM yielded a 63.9% black-box success rate on Inception-v4, up from 49.6% for FGSM, 14.8% for I-FGSM, and 45.9% for MI-FGSM when attacking Inception-v3.

NIPS 2017 Adversarial Competition

The effectiveness of these techniques was verified through comparisons against top defense solutions and official baselines from the NIPS 2017 adversarial competition. DI\textsuperscript{2}-FGSM and M-DI\textsuperscript{2}-FGSM showed substantial improvements, with M-DI\textsuperscript{2}-FGSM achieving a 73.0% average success rate, outperforming the top-1 attack submission by 6.6%.

Implications and Future Directions

The proposed input diversity strategy demonstrates a significant enhancement in generating transferable adversarial examples. This has important implications for evaluating the robustness of neural networks and the effectiveness of defense mechanisms. Practically, it underscores the need for defenses that can generalize well across varied input transformations, thereby mitigating the effectiveness of diverse-input attacks.

Theoretically, these findings suggest that many neural networks might share not just similar architectures but also similar decision boundaries, making them susceptible to adversarial examples generated with input diversity. Future research could further explore the underlying reasons for this phenomenon, potentially leading to improved understanding and new approaches for both attack and defense methods in artificial intelligence.

In terms of future developments in AI, the paper suggests that more robust training algorithms and defense mechanisms should anticipate diverse attack strategies. Adversarial training incorporating diverse inputs could be a promising direction to explore. Moreover, this research could catalyze the development of more sophisticated defense frameworks that offer resilience against highly transferable adversarial examples.

In summary, this paper provides a comprehensive and rigorous approach to improving the transferability of adversarial examples through input diversity. The results highlight its practical utility and theoretical significance, offering a strong benchmark for future research in adversarial robustness.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - cihangxie/DI-2-FGSM: Improving Transferability of Adversarial Examples with Input Diversity (163 stars)