Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses (2012.10544v4)

Published 18 Dec 2020 in cs.LG, cs.AI, cs.CR, and cs.CV

Abstract: As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.

Citations (225)

View on Semantic Scholar

Summary

The paper details how malicious data poisoning at training degrades model performance and induces incorrect predictions.
The paper demonstrates how hidden backdoor triggers can be embedded in models across diverse domains, causing unintended outputs.
The paper presents defense strategies employing anomaly detection, model repair, and safeguarded training techniques to mitigate these risks.

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

The paper "Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses" provides a comprehensive examination of security vulnerabilities in the ML data pipeline. It underscores the lack of trustworthy supervision in data collection for ML models and systematically categorizes various forms of dataset vulnerabilities, including data poisoning and backdoor attacks, as well as defenses against these threats.

Overview of Dataset Vulnerabilities

As ML models scale up, so do their data requirements, often necessitating the use of unverified sources for data collection. This introduces significant security risks, where malicious agents can exploit these open-world methods by introducing corrupted data to skew model behavior. The paper examines both passive and active data manipulation techniques, discussing how adversaries can leverage federated learning paradigms to inject harmful data.

Key Attack Vectors

1. Training-only Attacks:

The paper explores dataset poisoning exclusively at the training phase, where attackers manipulate training data to degrade model performance or cause incorrect predictions.
Classical examples include modifications in recommendation systems and spam filters.
Extensively studied methods involve bilevel optimization strategies, which scale based on computational advancements.

2. Backdoor Attacks:

Here, the adversary implants a hidden trigger in the model that, when activated, alters the model's output for an input and is otherwise undetectable.
These attacks have been observed in diverse domains including computer vision, natural language processing, and autonomous systems.
Attacks are categorized into model-agnostic and model-specific, with emphasis on transferring vulnerabilities across various configurations.

Defensive Mechanisms

Given the outlined risks, the paper categorizes defenses into detection, repair, and prevention stages.

1. Detection Mechanisms:

Techniques involve identifying outliers in both input and latent feature spaces using methods from robust statistics.
Further, models are scrutinized for anomalies in prediction behavior, which helps flag potential backdoor triggers without viewing the training data.

2. Model Repair:

Model repair strategies focus on removing backdoors by retraining or modifying existing network parameters. Resetting neuron connections is an essential technique outlined here.

3. Training-time Strategies:

Defenses such as differential privacy and random noise injection are explored to prevent models from being overly influenced by small subsets of corrupted data.
Particular focus is given to federated learning contexts, where secure aggregation and client-side monitoring are prominent strategies.

Implications and Forward-looking Challenges

The paper conveys practical implications for real-world deployment of ML systems, emphasizing the necessity for robust security frameworks.

The authors highlight several open challenges:

Determining efficient poisoning attacks under limited adversary knowledge.
Developing defenses without reliance on excessive computational resources or extensive clean datasets.
Understanding poisoning in federated and transfer learning contexts, and aligning defenses with the requirements of data privacy and user confidentiality.

As machine learning continues to interweave with critical sectors, understanding and counteracting data-centric threats become imperative, suggesting that future developments in AI must prioritize security as a fundamental aspect alongside efficacy.

PDF Markdown

Related Papers

YouTube

Show All Videos