A Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters (1811.00121v1)

Published 31 Oct 2018 in cs.CR, cs.LG, and stat.ML

Abstract: Naive Bayes spam filters are highly susceptible to data poisoning attacks. Here, known spam sources/blacklisted IPs exploit the fact that their received emails will be treated as (ground truth) labeled spam examples, and used for classifier training (or re-training). The attacking source thus generates emails that will skew the spam model, potentially resulting in great degradation in classifier accuracy. Such attacks are successful mainly because of the poor representation power of the naive Bayes (NB) model, with only a single (component) density to represent spam (plus a possible attack). We propose a defense based on the use of a mixture of NB models. We demonstrate that the learned mixture almost completely isolates the attack in a second NB component, with the original spam component essentially unchanged by the attack. Our approach addresses both the scenario where the classifier is being re-trained in light of new data and, significantly, the more challenging scenario where the attack is embedded in the original spam training set. Even for weak attack strengths, BIC-based model order selection chooses a two-component solution, which invokes the mixture-based defense. Promising results are presented on the TREC 2005 spam corpus.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces a defense mechanism that uses a two-component mixture model to isolate data poisoning in NB spam filters.
It employs an Expectation-Maximization algorithm and Bayesian Information Criterion to partition spam into attack and legitimate subsets.
Experimental results on the TREC 2005 corpus show that the method maintains accuracy around 0.9 even under strong poisoning conditions.

Overview of a Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters

The research paper "A Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters" addresses the vulnerability of Naive Bayes (NB) spam filters to data poisoning attacks orchestrated by exploiting known spam sources. The central proposition of the paper is a defense mechanism based on utilizing a mixture of NB models to mitigate the effects of such attacks. This paper is motivated by the limitations of existing spam filtering techniques, which typically employ a single component density model for representing spam, making them susceptible to data contamination.

Susceptibility of Naive Bayes Models

Naive Bayes models classify emails into 'spam' or 'ham' (not spam) categories. The inherent problem arises because these models consider all emails from known spam sources as representatives of the 'spam' class. If attackers deliberately design emails that blend characteristics of ham into spam-labeled messages, these classifiers become skewed, owing to their inability to identify such nuanced data poisoning. This effectively degrades their classification accuracy.

Mixture Model Defense

The paper introduces a two-component mixture model defense strategy. By deploying an Expectation-Maximization (EM) algorithm, the proposed method partitions the spam data into two distinct components. One component captures the legitimate spam data, while the other isolates potential attack samples. The application of Bayesian Information Criterion (BIC) ensures the judicious selection of a two-component model when the attack strength warrants it. This method effectively maintains classification robustness by quarantining the attack impact to a secondary NB component. The mechanism not only benefits scenarios with re-training post-attack but also when the attack is present during initial spam filtering training—a novel contribution in this context.

Experimental Validation

The empirical validation utilizes the TREC 2005 spam corpus, assessing the model’s performance across varied attack strengths and scenarios. In retraining scenarios, standard NB classifiers suffered significant accuracy drops with increasing attack strength. Conversely, the two-component model maintained classification accuracy, typically around 0.9, even under robust attack conditions. When comparing scenarios, this method showed superiority by maintaining robust performance even when initial training data were compromised.

Implications and Future Directions

The proposed defense mechanism has far-reaching implications beyond spam filtering, with potential applications to other domains subject to data poisoning. It offers a framework for pre-processing data to mitigate adversarial overfitting effects in generative models, and it could effectively shield discriminative models, like SVMs or DNNs, from corrupted data training.

Future lines of research could include extending this methodology to identify simultaneous attacks on both ham and spam distributions, address class drift over time, and apply parsimonious mixture modeling for higher-dimensional and dynamic contexts. Additionally, exploring automated feature selection integrated with the mixture components could further refine the method’s efficacy and computational efficiency.

This paper represents a significant advancement in adversarial machine learning, providing a practical, defensible approach against subtle and embedded data poisoning strategies targeting naive generative model classifiers.

PDF Markdown

Related Papers

YouTube

Show All Videos