Backdoor Learning: A Survey (2007.08745v5)

Published 17 Jul 2020 in cs.CR, cs.CV, and cs.LG

Abstract: Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if the hidden backdoor is activated by attacker-specified triggers. This threat could happen when the training process is not fully controlled, such as training on third-party datasets or adopting third-party models, which poses a new and realistic threat. Although backdoor learning is an emerging and rapidly growing research area, its systematic review, however, remains blank. In this paper, we present the first comprehensive survey of this realm. We summarize and categorize existing backdoor attacks and defenses based on their characteristics, and provide a unified framework for analyzing poisoning-based backdoor attacks. Besides, we also analyze the relation between backdoor attacks and relevant fields ($i.e.,$ adversarial attacks and data poisoning), and summarize widely adopted benchmark datasets. Finally, we briefly outline certain future research directions relying upon reviewed works. A curated list of backdoor-related resources is also available at \url{https://github.com/THUYimingLi/backdoor-learning-resources}.

Citations (522)

View on Semantic Scholar

Summary

The paper presents a systematic taxonomy of backdoor attacks, categorizing methods like visible, invisible, and semantic triggers.
It details methodologies to assess standard, backdoor, and perceivable risks in deep neural network performance.
The survey reviews both empirical and certified defense mechanisms, highlighting limitations and suggesting future research directions.

Backdoor Learning: A Survey

The paper "Backdoor Learning: A Survey" provides a comprehensive overview of the field of backdoor learning, which constitutes a crucial aspect of AI security. The primary focus is on backdoor attacks that introduce hidden backdoors into deep neural networks (DNNs) during the training phase. Such attacks enable attackers to manipulate model predictions by activating specific backdoor triggers, posing substantial security risks when training involves third-party datasets or models.

Overview of Backdoor Attacks

The authors categorize and analyze existing backdoor attacks, offering a unified framework to better understand poisoning-based attacks in the context of image classification. They delineate three core risks: standard risk (correct predictions on benign samples), backdoor risk (desired incorrect predictions when triggers are applied), and perceivable risk (detection of poisoned samples).

Backdoor attacks are further categorized based on various characteristics:

Visible vs. Invisible Attacks: Visible attacks like BadNets use easily detectable trigger patterns, while invisible attacks utilize less perceptible perturbations.
Optimized vs. Non-optimized Attacks: Optimized attacks involve strategies to refine trigger patterns for enhanced effectiveness.
Semantic Attacks: These use semantically meaningful objects or features as triggers, eliminating the need for external pattern stamping during inference.
Sample-specific Attacks: These create unique triggers for each sample, undermining defenses reliant on common trigger patterns.
Physical Attacks: Such attacks incorporate real-world variations, complicating defense measures due to environmental factors.
All-to-all Attacks: Different from typical all-to-one attacks, where a single label is targeted, all-to-all attacks assign different target labels for each class.
Black-box Attacks: These consider scenarios where attackers do not have access to the training data, increasing the practicality of backdoor threats.

Relation to Related Fields

Backdoor learning is juxtaposed with adversarial attacks and data poisoning:

Adversarial Attacks: Unlike adversarial attacks, which exploit model weaknesses during inference, backdoor attacks alter training data to create persistent vulnerabilities.
Data Poisoning: While traditional data poisoning aims to degrade overall model performance, advanced data poisoning shares the selective attack precision seen in backdoor strategies.

Defense Mechanisms

Addressing backdoor threats necessitates a range of defenses:

Empirical Defenses: These include pre-processing techniques to disrupt triggers, model reconstruction to eliminate injected backdoors, and sample filtering to remove poisoned inputs. The effectiveness of these methods varies, and they often require significant computational resources.
Certified Defenses: Utilizing randomized smoothing, these provide theoretical guarantees of robustness, although they are less widely developed compared to empirical approaches.

Implications and Future Directions

This survey underscores the growing imperative to understand and mitigate backdoor attacks as AI systems become increasingly integrated into critical technologies. Future research directions suggested include improved trigger design, exploration of semantic and physical backdoors, investigation of task-specific attacks, development of robust defenses, and deeper examination of the intrinsic mechanisms underpinning backdoor creation and activation.

The authors contribute significantly to backdoor learning literature by offering a systematic taxonomy and drawing fruitful connections with adjacent research fields, laying the groundwork for subsequent innovations in securing deep learning models from insidious backdoor vulnerabilities.

PDF Markdown