Enhancing the Transferability of Adversarial Attacks through Variance Tuning (2103.15571v3)

Published 29 Mar 2021 in cs.AI

Abstract: Deep neural networks are vulnerable to adversarial examples that mislead the models with imperceptible perturbations. Though adversarial attacks have achieved incredible success rates in the white-box setting, most existing adversaries often exhibit weak transferability in the black-box setting, especially under the scenario of attacking models with defense mechanisms. In this work, we propose a new method called variance tuning to enhance the class of iterative gradient based attack methods and improve their attack transferability. Specifically, at each iteration for the gradient calculation, instead of directly using the current gradient for the momentum accumulation, we further consider the gradient variance of the previous iteration to tune the current gradient so as to stabilize the update direction and escape from poor local optima. Empirical results on the standard ImageNet dataset demonstrate that our method could significantly improve the transferability of gradient-based adversarial attacks. Besides, our method could be used to attack ensemble models or be integrated with various input transformations. Incorporating variance tuning with input transformations on iterative gradient-based attacks in the multi-model setting, the integrated method could achieve an average success rate of 90.1% against nine advanced defense methods, improving the current best attack performance significantly by 85.1% . Code is available at https://github.com/JHL-HUST/VT.

Citations (322)

View on Semantic Scholar

Summary

The paper introduces variance tuning that stabilizes gradient updates to enhance the transferability of adversarial attacks in black-box settings.
Empirical results on ImageNet reveal a 90.1% success rate against nine advanced defense models, marking a 6.6% improvement over previous approaches.
The method underscores practical challenges in model defense and suggests the need for robust strategies against diverse, transferable adversarial examples.

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

The paper "Enhancing the Transferability of Adversarial Attacks through Variance Tuning" authored by Xiaosen Wang and Kun He introduces a novel method aimed at improving the transferability of adversarial attacks in black-box settings, specifically addressing the challenge of attacking defended models. The proposed method, variance tuning, is designed to enhance iterative gradient-based attack methods like MI-FGSM and NI-FGSM by stabilizing the update direction and escaping poor local optima.

Summary of the Method

In gradient-based adversarial attacks, transferability often presents a challenge, particularly when attacking models equipped with robust defense mechanisms. Traditional approaches to improving transferability have focused on modifying the gradient calculation, attacking multiple models, or employing input transformation techniques. This paper, however, proposes a variance tuning strategy integrated into the iterative attack process. The approach calculates gradient variance in the neighborhood of the current data point to adjust the gradient direction, thereby promoting better transferability to black-box models.

Empirical Results

Empirical evaluations on the ImageNet dataset validate the effectiveness of the proposed variance tuning approach. The paper reports especially strong attack performance against advanced defense mechanisms. Integrated with input transformations, the method achieves an average success rate of 90.1% against nine sophisticated defense models, marking a significant improvement over existing methods by a margin of 6.6%.

Theoretical and Practical Implications

The introduction of variance tuning presents both theoretical and practical implications. Theoretically, the approach suggests that examining variance in local gradient information can stabilize the adversarial attack process and enhance generalization across models. Practically, this enhancement provides attackers with a more potent tool that remains effective against robustly defended neural networks, posing security concerns for practical applications of DNNs.

Insights into Future AI Developments

The implications of this work extend into future developments in AI safety and robustness. The ability to craft highly transferable adversarial examples indicates a need for developing defenses that are not just model-specific but also effective against diverse attack strategies. Moreover, variance tuning could inspire similar variance-based methodologies in other domains of AI, potentially contributing to optimization techniques or general machine learning workflows.

In conclusion, the strategy of variance tuning presents a substantial advancement in the domain of adversarial attacks, highlighting both the vulnerabilities of current defense techniques and offering a new lens through which to evaluate and enhance model robustness. The continued evaluation and enhancement of such methods are essential as AI systems become more integrated into security-critical applications.

PDF Markdown

Related Papers

GitHub

GitHub - JHL-HUST/VT: Enhancing the Transferability of Adversarial Attacks through Variance Tuning (81 stars)