Model-Reuse Attacks on Deep Learning Systems (1812.00483v1)

Published 2 Dec 2018 in cs.CR

Abstract: Many of today's ML systems are built by reusing an array of, often pre-trained, primitive models, each fulfilling distinct functionality (e.g., feature extraction). The increasing use of primitive models significantly simplifies and expedites the development cycles of ML systems. Yet, because most of such models are contributed and maintained by untrusted sources, their lack of standardization or regulation entails profound security implications, about which little is known thus far. In this paper, we demonstrate that malicious primitive models pose immense threats to the security of ML systems. We present a broad class of {\em model-reuse} attacks wherein maliciously crafted models trigger host ML systems to misbehave on targeted inputs in a highly predictable manner. By empirically studying four deep learning systems (including both individual and ensemble systems) used in skin cancer screening, speech recognition, face verification, and autonomous steering, we show that such attacks are (i) effective - the host systems misbehave on the targeted inputs as desired by the adversary with high probability, (ii) evasive - the malicious models function indistinguishably from their benign counterparts on non-targeted inputs, (iii) elastic - the malicious models remain effective regardless of various system design choices and tuning strategies, and (iv) easy - the adversary needs little prior knowledge about the data used for system tuning or inference. We provide analytical justification for the effectiveness of model-reuse attacks, which points to the unprecedented complexity of today's primitive models. This issue thus seems fundamental to many ML systems. We further discuss potential countermeasures and their challenges, which lead to several promising research directions.

Citations (182)

View on Semantic Scholar

Summary

Summary of "Model-Reuse Attacks on Deep Learning Systems"

The paper "Model-Reuse Attacks on Deep Learning Systems" explores the security vulnerabilities posed by the widespread practice of reusing pre-trained primitive models in constructing machine learning systems. The authors, Yujie Ji et al., address the uncharted security risks introduced when these models, often sourced from untrusted third parties, are used as foundational components in various applications, such as skin cancer screening, speech recognition, face verification, and autonomous steering.

Core Findings

The paper highlights a novel class of model-reuse attacks, where maliciously crafted models can cause host machine learning systems to behave incorrectly on targeted inputs. These attacks are designed to be:

Effective: The targeted systems reliably misclassify specific inputs as intended by the adversary.
Evasive: Malicious models remain indistinguishable from benign models when evaluated on non-targeted inputs.
Elastic: The attack remains effective irrespective of differing system architectures or tuning strategies.
Easy: Little or no prior knowledge about the host system's tuning data or inference methodology is required by the adversary.

In their empirical evaluation, the authors demonstrate that these model-reuse attacks are achievable across several deep learning systems, showcasing attack success rates as high as 97% with significant misclassification confidence. Additionally, these manipulated models retain accuracy almost indistinguishable from unperturbed models, thus evading detection effectively.

Analytical Insights

The authors offer analytical justifications for their attack model, attributing its effectiveness to the inherent complexity and non-convexity of deep learning models. The paper posits that due to their vast parameter spaces, these models can be subtly manipulated to effectuate unintended behavior on singular inputs whilst maintaining overall performance on legitimate inputs.

They illustrate that the attacks are classifier-agnostic due to the pseudo-linearity of commonly used classifiers, and emphasize that addressing such attacks via complexity reduction or architectural reform of classifiers may not be feasible given practical constraints like computational overheads or non-linear ground-truth mappings.

Discussion and Future Directions

The authors advocate for a cautious approach towards integrating third-party models into machine learning systems, suggesting potential countermeasures such as rigorous vetting procedures and anomaly detection frameworks. They acknowledge the challenges due to high-dimensional feature spaces and note that effective countermeasures remain non-trivial problems.

The paper concludes by underscoring the fundamental nature of these vulnerabilities in modern deep learning systems and calls for further research into:

Developing principled frameworks for crafting adversarial models.
Exploring attacks involving combinations of multiple primitive models.
Expanding the paper of model-reuse attacks to other types of machine learning systems beyond deep learning.

This work serves as an inaugural foray into the specific security threats posed by model reuse in machine learning, setting the stage for future research that may secure the foundational elements of AI systems against adversarial exploitation.