MetaPoison: Practical General-purpose Clean-label Data Poisoning (2004.00225v2)

Published 1 Apr 2020 in cs.LG, cs.AI, cs.CR, cs.CV, and stat.ML

Abstract: Data poisoning -- the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data -- is an emerging threat in the context of neural networks. Existing attacks for data poisoning neural networks have relied on hand-crafted heuristics, because solving the poisoning problem directly via bilevel optimization is generally thought of as intractable for deep models. We propose MetaPoison, a first-order method that approximates the bilevel problem via meta-learning and crafts poisons that fool neural networks. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin. MetaPoison is robust: poisoned data made for one model transfer to a variety of victim models with unknown training settings and architectures. MetaPoison is general-purpose, it works not only in fine-tuning scenarios, but also for end-to-end training from scratch, which till now hasn't been feasible for clean-label attacks with deep nets. MetaPoison can achieve arbitrary adversary goals -- like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate for the first time successful data poisoning of models trained on the black-box Google Cloud AutoML API. Code and premade poisons are provided at https://github.com/wronnyhuang/metapoison

Authors (5)

W. Ronny Huang (25 papers)
Jonas Geiping (73 papers)
Liam Fowl (25 papers)
Gavin Taylor (20 papers)
Tom Goldstein (226 papers)

Citations (179)

View on Semantic Scholar

Summary

An Analytical Overview of "MetaPoison: Practical General-purpose Clean-label Data Poisoning"

The paper "MetaPoison: Practical General-purpose Clean-label Data Poisoning" introduces a novel approach to data poisoning in neural networks, delineating its methodology, results, and implications. MetaPoison outshines prior clean-label poisoning techniques by allowing data perturbations that deceitfully influence trained models while appearing unaltered. Its methodology revisits the bilevel optimization problem through a meta-learning perspective despite previous belief in its intractability for deep models.

Fundamentally, clean-label attacks are executed during training rather than inference, posing a significant challenge to defenses relying on input verification at inference time. Unlike feature collision (FC) methods that demand familiarity with the feature extractor and are constrained to fine-tuning or transfer learning contexts, MetaPoison extends its efficacy broadly, accommodating end-to-end training on several architectures. Notably, it proves potent even against Google's Cloud AutoML API, a black-box model where architectural and training specifics are concealed.

Core Methodology

MetaPoison endeavors to optimize the bilevel problem by selectively altering training data (poisons) to manipulate the model's response to a target input after learning. The paper employs a first-order approximation via meta-learning, rendering it feasible for deep models. The key lies in estimating adversarial loss by unrolling the SGD steps of the training procedure with various surrogate models and initializations. The process iterates to guide the weight vectors towards regions producing desired adversarial outcomes while maintaining the appearance of labeled accuracy.

During implementation, MetaPoison utilizes the ReColorAdv model for minimally perceptible perturbations, followed by Projected Gradient Descent (PGD) to refine these perturbations within constrained boundaries. An ensemble of surrogates with staggered training epochs enhances the generalizability of the poisons across potential network initial states.

Experimental Evaluation

Empirical evidence underscores MetaPoison's superiority across different settings. In CIFAR-10 trials, it successfully achieved a 100% attack success with minimal poison budgets on fine-tuned models, outperforming FC methods even without leveraging opacity-based augmentations. Moreover, its resilience expands to models retrained from scratch—a frontier uncharted by preceding approaches.

Remarkably, the paper finds that MetaPoison's success is not architecturally dependent, as poisons crafted on one architecture retain efficacy across others (e.g., from ResNet20 to VGG13). Further, varying the victim's training hyperparameters showed minimal degradation in attack success, illustrating robustness to real-world dynamism.

In a practical deployment on Google's AutoML, MetaPoison induced a significant success rate, reaffirming the feasibility of clean-label poisoning in a commercial setting.

Theoretical and Practical Implications

MetaPoison's potential to subvert machine learning systems infers significant implications for cybersecurity and model governance. While currently outstripped by the computational ease of input evasion attacks, it proposes a harder-to-detect threat trajectory as the model's general behaviors remain consistent, and only specific target predictions are manipulated. Consequently, measures aimed at inclusive data governance, augmented model verifications, and diversification of training datasets become pertinent. Additionally, its utilization for copyright enforcement hints at benign applications of the technique, prompting exploration into non-malicious contexts.

Future Directions

With MetaPoison setting a benchmark in the domain, further work could explore mitigation strategies conducive to detecting or immunizing against such insidious attacks. Research into better understanding the pathways through which targeted poisons affect model training dynamics might offer insights into underlying vulnerabilities. Additionally, investigating the scalability of this method for even larger and more varied datasets poses an avenue of development. As adversarial methods advance, the AI community is called upon to continually evolve defenses in parallel, ensuring the integrity and trustworthiness of machine learning systems.

In summary, this research contributes substantive advancements in data poisoning, proposing strategies that challenge existing defenses while offering insights into potential non-malicious applications. Its implications are broad, warranting continued discourse and strategic responses in both academic and practical realms.

PDF Markdown

Related Papers

GitHub

GitHub - wronnyhuang/metapoison: Craft poisoned data using MetaPoison (47 stars)