Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks (2006.12557v3)

Published 22 Jun 2020 in cs.LG, cs.CR, cs.CV, cs.CY, and stat.ML

Abstract: Data poisoning and backdoor attacks manipulate training data in order to cause models to fail during inference. A recent survey of industry practitioners found that data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks. However, it remains unclear exactly how dangerous poisoning methods are and which ones are more effective considering that these methods, even ones with identical objectives, have not been tested in consistent or realistic settings. We observe that data poisoning and backdoor attacks are highly sensitive to variations in the testing setup. Moreover, we find that existing methods may not generalize to realistic settings. While these existing works serve as valuable prototypes for data poisoning, we apply rigorous tests to determine the extent to which we should fear them. In order to promote fair comparison in future work, we develop standardized benchmarks for data poisoning and backdoor attacks.

View on arXiv

Authors (5)

Avi Schwarzschild (35 papers)
Micah Goldblum (96 papers)
Arjun Gupta (24 papers)
Tom Goldstein (226 papers)
John P Dickerson (12 papers)

Citations (145)

View on Semantic Scholar

Summary

Evaluating the Threat of Data Poisoning: A Unified Benchmark for Backdoor and Data Poisoning Attacks

The paper "Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks" addresses the pressing concern of data poisoning in ML systems, especially those that rely on vast, unvetted datasets. Data poisoning and backdoor attacks manipulate training data to compromise model performance during inference. Given that the industry's practitioners have highlighted data poisoning as a significant threat, this research provides a comprehensive analysis of existing poisoning methods, evaluates their effectiveness under various conditions, and introduces a standardized benchmarking framework.

The authors categorize poisoning attacks into two broad types: backdoor and triggerless. Backdoor attacks involve embedding triggers into training data to cause misclassification when similar triggers appear in test data. In contrast, triggerless attacks aim to cause targeted misclassification without any inference-time modification. The paper reviews several prominent poisoning methods, using figures like Feature Collision (FC), Convex Polytope (CP), Clean Label Backdoor (CLBD), and Hidden Trigger Backdoor (HTBD), providing detailed formulations and evaluation scenarios for each.

Key Findings

Fragility in Evaluation Settings: The paper identifies significant inconsistencies in the evaluation setups of existing literature, finding that the success of poisoning attacks is often contingent on specific choices of model architectures and experimental conditions. This inconsistency calls into question the generalizability of various attack methods.
Impact of Training Practices: It is evident from the paper that standard training practices like using SGD instead of ADAM and employing data augmentation result in decreased success rates for poisoning attacks. This indicates a gap between theoretical success rates under esoteric conditions and practical effectiveness.
Architecture Sensitivity: The choice of victim model architecture substantially affects poisoning attack success. For example, ResNet-18 architectures showed resistance compared to AlexNet variants, impacting the reliability of attack models.
Label Cleanliness: The paper challenges the claim of so-called "clean-label" attacks, showing that they often produce noticeable artifacts, thereby reducing their stealthiness and practical applicability.
Transfer Learning Vulnerability: Contrary to some assumptions, properly disjoint datasets in transfer learning improve model robustness against poisoning attacks, suggesting that attackers may face more significant hurdles in practical scenarios.
Inadequate Measure of Dataset Size: The success of poisoning is nonlinearly related to the dataset size, further complicating the comparison of methods purely by percentage of data poisoned.
Low Black-Box Performance: The paper also demonstrates that black-box settings significantly diminish attack effectiveness, highlighting their limited threat in scenarios where model architectures are unknown to the attacker.

Contributions

The paper develops a unified benchmarking framework to fairly evaluate data poisoning and backdoor attacks, standardizing multiple parameters such as dataset, model architecture, and attack conditions. This framework includes benchmarks on CIFAR-10 and TinyImageNet and emphasizes diverse testing through various configurations, including white-box, black-box, and training-from-scratch scenarios.

Implications and Future Directions

The paper's findings have direct implications for both the development of more robust ML systems and the design of future attacks. By highlighting current methods' practical limitations, the paper directs future research toward developing attacks with realistic threat models that consider common training practices.

Furthermore, this work suggests that as adversarial techniques evolve, the need for continual benchmarking will become increasingly critical. The authors encourage the community to adopt the presented benchmarks, which could drive the development of stronger defensive strategies and yield a more coherent understanding of fundamental vulnerabilities.

Conclusion

The paper provides an invaluable contribution by laying out the current landscape of data poisoning threats and offering a standardized approach for evaluating these emerging challenges in ML systems. By systematically dissecting and critiquing evaluation methodologies, the authors underscore the importance of adopting rigorous benchmarks for future work in this area, ultimately bolstering the field's ability to address adversarial threats more effectively.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos