A Survey on Transferability of Adversarial Examples across Deep Neural Networks (2310.17626v2)

Published 26 Oct 2023 in cs.CV

Abstract: The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models into making erroneous predictions, raising concerns for safety-critical applications. An intriguing property of this phenomenon is the transferability of adversarial examples, where perturbations crafted for one model can deceive another, often with a different architecture. This intriguing property enables black-box attacks which circumvents the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples. We categorize existing methodologies to enhance adversarial transferability and discuss the fundamental principles guiding each approach. While the predominant body of research primarily concentrates on image classification, we also extend our discussion to encompass other vision tasks and beyond. Challenges and opportunities are discussed, highlighting the importance of fortifying DNNs against adversarial vulnerabilities in an evolving landscape.

Citations (22)

View on Semantic Scholar

Summary

The paper surveys extensive research on the phenomenon of adversarial example transferability across different deep neural networks, emphasizing its critical security implications.
It categorizes methods for enhancing adversarial transferability into optimization-based techniques like momentum and data augmentation, and generation-based approaches utilizing models such as GANs.
The survey examines metrics for evaluating transferability, discusses challenges in cross-domain scenarios, and proposes future research directions like hybrid methodologies and improved surrogate models.

A Survey on Transferability of Adversarial Examples Across Deep Neural Networks

The paper "A Survey on Transferability of Adversarial Examples Across Deep Neural Networks" provides a comprehensive exploration of the phenomenon where adversarial examples crafted for one neural network can successfully deceive other networks, posing significant security concerns for machine learning applications. This survey collates and synthesizes a substantial body of research dedicated to understanding this adversarial transferability, examining it from multiple perspectives and across various domains.

Adversarial examples are inputs intentionally engineered to mislead machine learning models into incorrect predictions. Although they are imperceptible to human observers, adversarial examples can drastically impact the performance and reliability of models in safety-critical fields like autonomous driving and medical diagnostics. The transferability of these adversarial examples, i.e., their ability to fool different models without explicit re-crafting per model, is a particularly compelling aspect of robustness studies and has profound implications for both defense mechanisms and attack strategies in deep neural networks (DNNs).

The survey categorizes the methods to enhance adversarial transferability into two principal domains: optimization-based methods and generation-based methods. Optimization-based methods involve modifying attack strategies and model interactions during inference, while generation-based methods leverage generative models to synthesize adversarial examples. In optimization-based approaches, key techniques include data augmentation, momentum integration to escape local minima, and modified loss objectives which mitigate vanishing gradients for better attack outcomes. These techniques are crafted to increase the deviation in model decisions by altering features or through ensemble attacks.

Prominent among the surveyed generation-based methods are techniques using generative adversarial networks (GANs), which model latent distributions to produce visually plausible adversarial examples that maintain effectiveness across models. Conditional generation techniques, in particular, provide a unified framework for handling multiple target classes, achieving significant efficiency over separate models per class.

In evaluating adversarial transferability, metrics such as fooling rate, interest class rank, and knowledge transfer-based metrics are discussed. These metrics highlight the need for reliable validation of transferability across diverse model architectures and configurations. Notably, the paper discusses challenges such as poor transferability in cross-domain and cross-task scenarios, paving the way for future improvements through task-specific prompts and dynamic cues in pre-trained models.

The survey emphasizes the importance of rigorous evaluation frameworks to ascertain the security of DNNs against transfer attacks. It suggests that enhancing theoretical understanding and establishing comprehensive benchmarks could significantly bolster the development of robust machine learning models. Moreover, the paper posits future research directions such as hybrid methodologies combining optimization and generation strategies or designing more effective surrogate models with improved knowledge transferability.

Overall, this paper is a valuable resource for researchers exploring adversarial robustness, providing insights into the fundamental mechanisms of adversarial transferability and projecting pathways for developing resilient defenses against evolving adversarial attack strategies within AI systems.

Related Papers

GitHub

GitHub - JindongGu/Awesome_Adversarial_Transferability: A curated list of papers for the transferability of adversarial examples (71 stars)

Tweets

https://twitter.com/PeacekeeperBH/status/1880050996698898786