Patch-wise Attack for Fooling Deep Neural Network (2007.06765v3)

Published 14 Jul 2020 in cs.CV

Abstract: By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models. Features of a pixel extracted by deep neural networks (DNNs) are influenced by its surrounding regions, and different DNNs generally focus on different discriminative regions in recognition. Motivated by this, we propose a patch-wise iterative algorithm -- a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise. In this way, without sacrificing the performance of white-box attack, our adversarial examples can have strong transferability. Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $\epsilon$-constraint is properly assigned to its surrounding regions by a project kernel. Our method can be generally integrated to any gradient-based attack methods. Compared with the current state-of-the-art attacks, we significantly improve the success rate by 9.2\% for defense models and 3.7\% for normally trained models on average. Our code is available at \url{https://github.com/qilong-zhang/Patch-wise-iterative-attack}

Citations (117)

View on Semantic Scholar

Summary

The paper's main contribution is the introduction of PI-FGSM, a patch-wise strategy for enhancing adversarial transferability in DNNs.
The methodology employs an amplification factor with iterative updates to craft robust adversarial patches across varied model architectures.
Experimental results on ImageNet reveal improved success rates by up to 9.2% against defense models compared to traditional attacks.

Patch-wise Attack for Fooling Deep Neural Networks

The paper "Patch-wise Attack for Fooling Deep Neural Network" authored by Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, and Heng Tao Shen presents a novel adversarial attack methodology targeting Deep Neural Networks (DNNs). In contrast to traditional pixel-wise adversarial noise perturbations, the paper introduces a patch-wise attack strategy that significantly enhances the transferability of adversarial examples in black-box settings.

Overview of Methodology

The proposed approach is named the Patch-wise Iterative Fast Gradient Sign Method (PI-FGSM). It capitalizes on the spatial dependency in imagery wherein features are not isolated to pixels but span across patches. The central insight is that adversarial noise crafted in a patch-oriented manner aligns more closely with the operational attention regions of DNNs. The paper posits that different DNN models focus on different discriminatory regions of an image, and thus, patch-wise noise can improve both the generalizability and effectiveness of adversarial attacks.

The PI-FGSM deviates from standard iterative gradient methods by incorporating an amplification factor to the step size, which scales the gradient update with each iteration. This parameter is crucial as it determines the balance between the precision of the adversarial attack and the probability of achieving successful transfer attacks across different models.

Key Contributions and Results

In a series of experiments conducted on the ImageNet dataset, PI-FGSM demonstrated superior performance over existing state-of-the-art attacks, showing an average improvement in success rate of 9.2% against defense models and 3.7% against normally trained models in black-box scenarios. Notable points include:

Amplification Factor: Utilizing an optimized amplification factor, the success rate of adversarial attacks increased significantly, especially when transferring examples to defense models.
Robustness Across Models: The patch-wise attack method shows marked success in increasing adversarial example effectiveness across varied model architectures, such as Inception V3 and ResNet, confirming the hypothesis of diversified model attention maps.
Extensibility to Other Methods: PI-FGSM can be seamlessly integrated with other gradient-based attack methods enhancing their attack capability without compromising on the transferability of adversarial examples.

Implications and Future Outlook

This research sheds light on the profound impact adversarial noise structured in a spatially coherent manner can have under black-box constraints in AI security scenarios. It challenges the de facto notion that iterative methods are inferior in transfer scenarios compared to single-step attacks by demonstrating how strategic parameter tuning and noise structuring can overcome such performance barriers.

For the broader AI field, this introduces an avenue for exploring adversarial attack and defense mechanisms that consider the intrinsic spatial representation of features in neural networks. The insights provided could catalyze the development of more robust, fool-proof DNN architectures and, conversely, more potent adversarial attacks. Future efforts may explore optimizing project kernels and amplification factors across diverse datasets and model types to further validate the universality of this approach.

In summary, this paper contributes significantly to the body of knowledge in adversarial machine learning, offering both theoretical insights and practical advancements in crafting transferable adversarial examples. It opens avenues for new research in understanding model vulnerabilities and developing robust AI systems capable of resisting advanced adversarial strategies.

PDF Markdown

Related Papers

GitHub

GitHub - qilong-zhang/Patch-wise-iterative-attack: Patch-wise iterative attack (accepted by ECCV 2020) to improve the transferability of adversarial examples. (89 stars)