Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review (2007.10760v3)

Published 21 Jul 2020 in cs.CR, cs.CV, and cs.LG

Abstract: This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-deployment. Accordingly, attacks under each categorization are combed. The countermeasures are categorized into four general classes: blind backdoor removal, offline backdoor inspection, online backdoor inspection, and post backdoor removal. Accordingly, we review countermeasures, and compare and analyze their advantages and disadvantages. We have also reviewed the flip side of backdoor attacks, which are explored for i) protecting intellectual property of deep learning models, ii) acting as a honeypot to catch adversarial example attacks, and iii) verifying data deletion requested by the data contributor.Overall, the research on defense is far behind the attack, and there is no single defense that can prevent all types of backdoor attacks. In some cases, an attacker can intelligently bypass existing defenses with an adaptive attack. Drawing the insights from the systematic review, we also present key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and in particular, more efficient and practical countermeasures are solicited.

Citations (197)

View on Semantic Scholar

Summary

The paper presents a systematic six-fold taxonomy of backdoor attack surfaces, pinpointing vulnerabilities from data collection to model deployment.
The paper outlines defense methods including blind removal, offline and online inspections, and retraining to mitigate backdoor threats.
The paper emphasizes the ongoing arms race between adaptive backdoor attacks and evolving defense mechanisms in deep learning systems.

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

The document entitled "Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review" delivers a detailed examination of backdoor vulnerabilities and defenses in deep learning (DL) models. This comprehensive review is pertinent given the increasing deployment of DL systems in critical applications such as computer vision, disease diagnosis, and cybersecurity, where model integrity is paramount. Below, I provide an expert summary of the paper, shedding light on the adversarial landscape of backdoor attacks and the emerging defense strategies.

Taxonomy and Attack Surfaces

The paper presents a six-fold classification of potential backdoor attack surfaces that threaten DL systems across different stages of the model lifecycle:

Code Poisoning: Exploiting vulnerabilities in DL frameworks can allow attackers to inject backdoors without direct data access, posing a formidable security concern for widely-used libraries like TensorFlow and PyTorch.
Outsourcing: Training models remotely or using third-party services introduces risks, as malicious service providers may implant backdoors during model training.
Pretrained: Utilizing pretrained models or transfer learning pipelines is common; however, backdoors can be embedded within feature extractors, making downstream tasks vulnerable.
Data Collection: Challenges arise from data poisoning, particularly in clean-label attacks where inputs appear benign but are crafted to mislead models post-training.
Collaborative Learning: Federated and split learning paradigms, while enhancing privacy, present avenues for backdoor attacks due to decentralized and private local data training.
Post-deployment: Model tampering post-deployment, such as through fault injection, can activate latent backdoors, compromising model decisions.

Countermeasure Approaches

The document delineates various countermeasure strategies divided into four primary categories:

Blind Backdoor Removal: These methods aim to eliminate backdoors without initially distinguishing between clean and tainted models. Techniques such as fine-pruning work by identifying and deactivating neurons conceivably involved with backdoors.
Offline Inspection: This includes detecting manipulated models via strategies like activation clustering and spectral signature analysis, which assess latent data representations for anomalies indicative of backdoors.
Online Inspection: These defenses detect backdoors during runtime by monitoring model inference behaviors and input characteristics, such as through STRIP (STRonghold of Robustness against Inferential Poisoning) to identify trigger inputs based on entropy patterns.
Post Backdoor Removal: These tactics involve retraining models with corrected labels once a backdoor is detected, aiming to erase traces of malicious tampering while restoring reliable model outputs.

Future Implications and Challenges

While the coverage of backdoor attacks and defenses is extensive, the review identifies several unresolved challenges open for future exploration:

Adaptive Attacks: The ongoing development of defenses often spurs adaptive attack strategies that can potentially bypass existing defenses, indicating a cyclical security arms race in DL fields.
Artifacts and Benchmarks: Encouraging open-source dissemination of code and systematic evaluations helps standardize metrics for both attacks and defenses, fostering community-driven enhancements.
Robust Trigger Design: Given the success of dynamic and inconspicuous triggers in physical attacks, novel trigger designs paint an insightful path for research in attack resilience under diverse environmental conditions.
Generalization of Defenses: While defense techniques have shown efficacy primarily in the vision domain, cross-domain capabilities to other areas like NLP and audio await rigorous development and validation.

Overall, the survey highlights the dynamic nature of adversarial research in DL, underscoring the necessity for tailored, context-aware defense mechanisms. As DL models become more pervasive in security-critical applications, ongoing research and innovative defense architectures will play crucial roles in safeguarding AI systems from backdoor threats. Future works are recommended to balance between practical applicability and the ease of integration within existing DL deployment pipelines to enhance resilience against increasingly sophisticated backdoor attacks.