Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching (2009.02276v2)

Published 4 Sep 2020 in cs.CV and cs.LG

Abstract: Data Poisoning attacks modify training data to maliciously control a model trained on such data. In this work, we focus on targeted poisoning attacks which cause a reclassification of an unmodified test image and as such breach model integrity. We consider a particularly malicious poisoning attack that is both "from scratch" and "clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. Previous poisoning attacks against deep neural networks in this setting have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets. The central mechanism of the new attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical considerations. and show its threat to real-world practitioners, finding that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset. Finally we demonstrate the limitations of existing defensive strategies against such an attack, concluding that data poisoning is a credible threat, even for large-scale deep learning systems.

PDF Abstract

Overview of "Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching"

The paper "Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching" presents a comprehensive paper on data poisoning attacks targeting deep neural networks (DNNs). Specifically, it introduces a novel method for highly targeted data poisoning, capable of manipulating models trained from scratch in large-scale settings such as ImageNet. The core contribution is leveraging gradient matching to create a robust attack that efficiently subverts DNN classification without sacrificing model accuracy in non-target scenarios.

Key Contributions

Gradient Matching for Data Poisoning: The authors propose a new paradigm for data poisoning attacks based on aligning the gradients of poisoned data with those of specific target data. This approach effectively guides the model's parameters towards misclassifying the target instance as a chosen adversarial class.
Practical Scalability: Unlike prior work, which primarily focused on limited or simplified settings, this attack framework scales to industrial-sized datasets by significantly reducing computational overhead. The experimental evaluation demonstrates the attack's success on ImageNet using commonly employed architectures such as ResNet-18.
Differential Privacy as a Mitigation Strategy: While many existing defensive strategies fail to adequately protect against this type of attack, the paper finds that strong differential privacy can partially mitigate the attack's effects—albeit at the cost of significant degradation in model performance.

Numerical Evaluation Highlights

Efficacy: The attack achieves targeted misclassification with an average success rate of 90% on CIFAR-10 with a 1% poisoning budget and an $\ell_\infty$ perturbation constraint of 16. On ImageNet, it achieves success in up to 80% of trials, even with a poisoning budget as low as 0.05%.
Computational Efficiency: The proposed method requires substantially less computational time than previously known methods like MetaPoison, making it feasible for large-scale datasets.

Theoretical Insights

The theoretical analysis confirms that under specific conditions, aligning poisoned data gradients with target gradients results in adversarial loss minimization during training. This effectively makes the attack undetectable until the adversarial classes are triggered, offering insights into the fundamental vulnerabilities within gradient-based optimization processes in DNNs.

Implications and Future Directions

This research underscores the pressing need for robust and scalable defenses against data poisoning attacks. The demonstrated potential of gradient matching attacks to compromise real-world models calls for immediate attention to secure data pipelines, particularly when training models on external or user-generated datasets.

Future advancements might explore extending gradient alignment methodologies beyond classification tasks, potentially addressing other machine learning paradigms, including reinforcement learning and sequential models. Additionally, crafting more sophisticated differential privacy mechanisms that balance privacy, robustness, and generalization remains an open area.

In summary, the paper's insights into gradient matching-based data poisoning present significant implications for the security of machine learning systems, while also pushing the boundaries of what is considered feasible in adversarial attack methodology.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Jonas Geiping (73 papers)
Liam Fowl (25 papers)
W. Ronny Huang (25 papers)
Wojciech Czaja (24 papers)
Gavin Taylor (20 papers)
Michael Moeller (62 papers)
Tom Goldstein (226 papers)

Citations (197)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - JonasGeiping/poisoning-gradient-matching: Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching (105 stars)

Tweets

https://twitter.com/_Vassim/status/1846210397033869702