SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics (2104.11315v1)

Published 22 Apr 2021 in cs.LG, cs.AI, and stat.ML

Abstract: Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly concerning scenario is when a small fraction of poisoned data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these defenses work only when a certain spectral signature of the poisoned examples is large enough for detection. There is a wide range of attacks that cannot be protected against by the existing defenses. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense provides a clean model, completely removing the backdoor, even in regimes where previous methods have no hope of detecting the poisoned examples. Code and pre-trained models are available at https://github.com/SewoongLab/spectre-defense .

Citations (137)

View on Semantic Scholar

Summary

The paper introduces SPECTRE, leveraging robust covariance estimation to amplify the spectral signature of poisoned data and neutralize backdoor attacks.
It employs singular value decomposition for dimensionality reduction, isolating key singular directions that improve the detection of malicious triggers.
It further integrates a Quantum Entropy-based scoring system to reliably identify outlier samples, demonstrating superior performance over state-of-the-art defenses.

Overview of SPECTRE: A Robust Defense Against Backdoor Attacks

The paper "SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics" addresses the issue of backdoor attacks, which are a significant threat in machine learning models. Backdoor attacks involve incorporating a small fraction of crafted, poisoned data into the training dataset, causing the model to behave maliciously when specific triggers are presented, while remaining accurate on clean data. Existing defenses are often insufficient due to their reliance on detecting a large spectral signature of poisoned samples, which is not always present. This research introduces SPECTRE (Spectral Poison ExCision Through Robust Estimation), an advanced defense strategy that leverages robust statistical methods to enhance the detection of poisoned samples and effectively remove backdoors from trained models.

Key Contributions

The primary contribution of this work is the development of SPECTRE, a robust defense mechanism designed to guard against a broad range of backdoor attacks. The innovations are:

Robust Covariance Estimation: SPECTRE utilizes robust mean and covariance estimation techniques to accurately assess and differentiate between clean and poisoned data. The core idea is to whiten the data based on these estimations, which amplifies the spectral signature of the poisoned data and mitigates their impact on the model.
Dimensionality Reduction: The method employs singular value decomposition (SVD) for dimensionality reduction, focusing on a computed number of top singular directions that best reveal the spectral signature of poisoned data. An algorithm is proposed to identify an appropriate dimensionality $k$ , which is crucial for effective processing given the sample sizes typically available in real-world datasets.
Quantum Entropy (QUE) Scoring: A novel scoring mechanism adapts existing Quantum Entropy ideas to better identify outliers—i.e., potential poisoned samples—based on the adapted spectral profile post-whitening. This scoring system interpolates between squared norm and squared projected norm measures to tailor the response to the specific dimensional attributes of each attack.
Comprehensive Evaluation: The paper presents an empirical evaluation of SPECTRE's performance against state-of-the-art defenses, demonstrating its robustness across various attack strategies, including pixel, periodic, and label-consistent backdoor attacks.

Implications and Future Directions

SPECTRE's successful application across multiple backdoor attack scenarios illustrates its potential to significantly enhance the security of machine learning models. By addressing the limitations of contemporary defenses, it paves the way for the development of more secure models, particularly in distributed and federated learning environments where data sources can be untrusted.

Future research could extend this work by adapting SPECTRE for decentralized settings where access to data is restricted due to privacy concerns. Furthermore, exploring the combination of SPECTRE with other defensive techniques, such as those aimed at adversarial robustness or STRIP defenses, could provide a more comprehensive approach to safeguarding AI systems.

The introduction of robust estimation methods also opens theoretical questions regarding their limits and potential improvements in estimation under various model assumptions. As machine learning applications continue to grow in complexity and exposure, methods like SPECTRE that enhance robustness to data poisoning will become increasingly critical in ensuring the integrity and reliability of AI systems.