Stronger Data Poisoning Attacks Break Data Sanitization Defenses (1811.00741v2)

Published 2 Nov 2018 in stat.ML, cs.CR, and cs.LG

Abstract: Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models' training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. By adding just 3% poisoned data, our attacks successfully increase test error on the Enron spam detection dataset from 3% to 24% and on the IMDB sentiment classification dataset from 12% to 29%. In contrast, existing attacks which do not explicitly account for these data sanitization defenses are defeated by them. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. As this optimization involves solving an expensive bilevel problem, our three attacks correspond to different ways of approximating this problem, based on influence functions; minimax duality; and the Karush-Kuhn-Tucker (KKT) conditions. Our results underscore the need to develop more robust defenses against data poisoning attacks.

Authors (3)

Pang Wei Koh (64 papers)
Jacob Steinhardt (88 papers)
Percy Liang (239 papers)

Citations (224)

View on Semantic Scholar

Summary

An Analysis of "Stronger Data Poisoning Attacks Break Data Sanitization Defenses"

The paper "Stronger Data Poisoning Attacks Break Data Sanitization Defenses" explores the intersection of adversarial machine learning and data integrity, explicitly focusing on the robustness of data sanitization techniques. Authored by Pang Wei Koh, Jacob Steinhardt, and Percy Liang, this work explores the vulnerabilities that emerge when malicious actors introduce small perturbations to the training data, leading to compromised model performance.

Abstract and Motivation

The core proposition of the paper is the development and analysis of novel data poisoning strategies capable of subverting existing data sanitization defenses. In particular, the researchers scrutinize the efficacy of various sanitization methods intended to protect machine learning models from adversarially poisoned data. By challenging the assumptions underlying these defenses, the paper sets the groundwork for understanding how such defenses can be systematically bypassed.

Methodological Approach

The authors present a comprehensive theoretical framework encompassing both the design and implementation of enhanced data poisoning techniques. The framework addresses various adversarial goals, such as reducing test accuracy or disrupting specific model functionalities. Predicated on the fundamental properties of machine learning algorithms, including differentiability and sensitivity to data perturbations, this framework facilitates the generation of highly effective attacks.

Central to this paper is the demonstration of advanced poisoning attacks that prove successful against state-of-the-art sanitization methods, which typically involve anomaly detection and robust statistical models. The authors investigate the transferability and generalization of these attacks across different data modalities and machine learning models. Notably, the findings indicate that existing sanitization methods possess significant vulnerabilities when subjected to strategically crafted adversarial perturbations.

Experimental Design

The experimental results are conducted on several canonical datasets, with a focus on empirical validation of the proposed attacks' potency. Through rigorous evaluation, the authors highlight scenarios in which poisoning attacks degrade the model's test accuracy by a considerable margin, even after the application of sanitization defenses. The reproducibility section provides access to the source code and datasets utilized, ensuring transparency and facilitating further research by the community.

Implications and Future Directions

The implications of this research are profound for the security of machine learning systems, particularly in safety-critical applications where data integrity is paramount. These findings prompt a reconsideration of the security guarantees offered by existing data sanitization solutions and suggest that reliance on such defenses may be misplaced. Furthermore, the work invites future research to innovate more resilient defense mechanisms that can withstand increasingly sophisticated adversarial threats.

Looking forward, a pertinent line of inquiry involves the design of adaptive defense strategies that can dynamically counter evolving attack methodologies. Additionally, understanding the trade-offs between robustness and model performance or efficiency in deploying such defenses could have significant impacts on the practical deployment of secure machine learning applications.

Conclusion

"Stronger Data Poisoning Attacks Break Data Sanitization Defenses" presents a rigorous exploration of the weaknesses in prevailing data sanitization practices when confronted with cleverly devised adversarial attacks. This work not only challenges current paradigms in secure machine learning but also opens avenues for the development of more robust defenses. As adversarial machine learning continues to evolve, the insights drawn from this paper remain invaluable in guiding future research and safeguarding machine learning models against emerging threats.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/briandcolwell/status/1909973608937800189