Adversarial Attacks and Defenses in Images, Graphs and Text: A Review (1909.08072v2)

Published 17 Sep 2019 in cs.LG, cs.CR, and stat.ML

Abstract: Deep neural networks (DNN) have achieved unprecedented success in numerous machine learning tasks in various domains. However, the existence of adversarial examples has raised concerns about applying deep learning to safety-critical applications. As a result, we have witnessed increasing interests in studying attack and defense mechanisms for DNN models on different data types, such as images, graphs and text. Thus, it is necessary to provide a systematic and comprehensive overview of the main threats of attacks and the success of corresponding countermeasures. In this survey, we review the state of the art algorithms for generating adversarial examples and the countermeasures against adversarial examples, for the three popular data types, i.e., images, graphs and text.

Authors (7)

Han Xu (92 papers)
Yao Ma (149 papers)
Haochen Liu (40 papers)
Debayan Deb (20 papers)
Hui Liu (481 papers)
Jiliang Tang (204 papers)
Anil K. Jain (92 papers)

Citations (631)

View on Semantic Scholar

Summary

Adversarial Attacks and Defenses in Images, Graphs, and Text: A Review

This paper presents a comprehensive review of adversarial attacks and their defenses across various data types, particularly focusing on images, graphs, and text. Given the increasing deployment of DNNs in safety-critical applications, the work is timely in highlighting vulnerabilities and potential countermeasures.

Overview of Adversarial Attacks

Adversarial attacks are perturbed inputs designed to mislead machine learning models into making incorrect predictions. These attacks are of significant concern due to their ability to affect domains where precision is critical, such as autonomous vehicles and financial fraud detection systems.

Classification of Attacks

White-Box Attacks: These require full access to the model’s architecture and parameters. Techniques such as FGSM, DeepFool, and the Carlini & Wagner method are discussed, emphasizing their efficiency in identifying minimally distorted adversarial inputs.
Black-Box Attacks: These do not require access to model internals but rely on query-based methods. The paper elaborates on substitute models and zeroth-order optimization techniques that utilize model outputs to craft attacks.
Poisoning Attacks: These manipulate training data to degrade model performance or cause incorrect predictions on specific inputs. Given their intricacy, these attacks commonly target models relying on inductive learning from graph data.
Physical and Unrestricted Attacks: The feasibility of physical-world attacks, such as the alteration of road signs, is noted. Unrestricted adversarial examples use generative models to create imperceptible adversaries that evade common defenses.

Countermeasures and Defenses

The paper categorizes defenses into three main strategies:

Gradient Masking/Obfuscation: Methods like Defensive Distillation and randomized models attempt to hinder gradient-based attacks by concealing or confusing gradients. However, as noted by the authors, these defenses often fail under rigorous attack conditions.
Robust Optimization: Techniques such as adversarial training, which involve training models on adversarial examples, and regularization methods that stabilize model behavior, are highlighted as promising approaches. Certifiable defenses offer theoretical guarantees against specific types of perturbations, albeit with limited scalability.
Adversarial Example Detection: This involves distinguishing adversarial from benign inputs using auxiliary models or statistical tests. Consistency checks across predictions further enhance the detection mechanisms.

Implications and Future Directions

The survey underscores the dynamic interplay between attack strategies and defense mechanisms, suggesting a continuous evolution of both. Practical implications include the need for robust model evaluation metrics beyond standard accuracy, integrating safety checks even in non-critical applications.

Future research is likely to explore more efficient adversarial training, scalable certifiable defenses, and the creation of standard benchmarks for evaluating defense mechanisms in varied data domains. Understanding adversarial vulnerability may not only enhance model robustness but also provide deeper insights into model interpretability and decision-making.

Overall, the paper provides a detailed foundation for researchers looking to navigate the complexities of adversarial machine learning, aligning theoretical insights with practical applications.

PDF Markdown