Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks (1908.06281v5)

Published 17 Aug 2019 in cs.LG, cs.CR, and stat.ML

Abstract: Deep learning models are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on benign inputs. However, under the black-box setting, most existing adversaries often have a poor transferability to attack other defense models. In this work, from the perspective of regarding the adversarial example generation as an optimization process, we propose two new methods to improve the transferability of adversarial examples, namely Nesterov Iterative Fast Gradient Sign Method (NI-FGSM) and Scale-Invariant attack Method (SIM). NI-FGSM aims to adapt Nesterov accelerated gradient into the iterative attacks so as to effectively look ahead and improve the transferability of adversarial examples. While SIM is based on our discovery on the scale-invariant property of deep learning models, for which we leverage to optimize the adversarial perturbations over the scale copies of the input images so as to avoid "overfitting" on the white-box model being attacked and generate more transferable adversarial examples. NI-FGSM and SIM can be naturally integrated to build a robust gradient-based attack to generate more transferable adversarial examples against the defense models. Empirical results on ImageNet dataset demonstrate that our attack methods exhibit higher transferability and achieve higher attack success rates than state-of-the-art gradient-based attacks.

Authors (5)

Jiadong Lin (6 papers)
Chuanbiao Song (11 papers)
Kun He (177 papers)
Liwei Wang (239 papers)
John E. Hopcroft (34 papers)

Citations (480)

View on Semantic Scholar

Summary

Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks

The paper introduces two novel methods to enhance the transferability of adversarial examples under the black-box setting: Nesterov Iterative Fast Gradient Sign Method (NI-FGSM) and Scale-Invariant attack Method (SIM). These methods address the challenges faced by existing gradient-based attacks, which often demonstrate limited success in attacking defense models without full accessibility to the model details.

Proposed Methods

NI-FGSM: This approach incorporates the Nesterov accelerated gradient, known for its effectiveness over traditional momentum in optimization, into the iterative adversarial attack framework. By leveraging the anticipatory update characteristic of Nesterov's method, NI-FGSM stabilizes and corrects update directions, thereby escaping poor local maxima more efficiently. This results in greater transferability of adversarial examples.
SIM: The SIM method exploits the observed scale-invariant property of deep neural networks. This property indicates that loss values remain consistent for both original and scaled images. By optimizing adversarial perturbations over multiple scaled versions of the input image, SIM mitigates overfitting on specific models, enhancing transferability to other black-box models.

Experimental Validation

Extensive experimentation using the ImageNet dataset demonstrates significant performance improvements. The combination of NI-FGSM and SIM, denoted as SI-NI-FGSM, achieves superior attack success rates compared to state-of-the-art methods. Notably, the SI-NI-TI-DIM variant, which integrates SI-NI-FGSM with translation-invariant and diverse input methods, reports an impressive 93.5% success rate against adversarially trained models in a black-box setting.

Theoretical and Practical Implications

Theoretically, this research underscores the potential of integrating advanced optimization techniques and leveraging inherent properties of neural networks to create more potent adversarial attacks. It also highlights an innovative approach to model augmentation through loss-preserving transformations, circumventing the need for computationally expensive training of multiple models.

Practically, the findings raise new security considerations for defensive mechanisms in deep learning frameworks, particularly those relying on adversarial training. The high transferability of adversarial examples presented necessitates more robust, possibly novel, defense strategies that can withstand such sophisticated attack methods.

Future Directions

Future work could explore further enhancements to optimization-based attacks by integrating other gradient-based optimization techniques, such as Adam. Additionally, a deeper exploration into why the scale-invariant property exists, potentially tied to batch normalization effects, may unlock new avenues for adversarial attack methodologies.

In conclusion, the paper makes a significant contribution by refining adversarial attack strategies, calling for heightened awareness and development in defensive measures in AI systems. This work not only advances the understanding of attack transferability but also paves the way for future exploration of adversarial robustness in deep learning.

PDF Markdown

Related Papers

Find Related Papers