Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models (2404.01231v1)

Published 1 Apr 2024 in cs.CR and cs.LG

Abstract: It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-LLMs (CLIP) and LLMs, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.

PDF HTML Abstract

Enhancing Membership Inference Attacks through Pre-Trained Model Poisoning

Introduction

The rise of large pre-trained models in machine learning has shifted attention towards efficiently fine-tuning them for specific tasks, given their broad applicational insights. However, the openness and accessibility of pre-trained models pose new vulnerabilities, specifically to backdoor attacks which aim to inject harmful behaviors into the models. This paper introduces the concept of a Privacy Backdoor Attack, a novel vulnerability exploitable through the poisoning of pre-trained models, leading to an amplified data leakage rate during the fine-tuning process.

Key Contributions

The paper presents a comprehensive analysis of privacy backdoor attacks, showcasing their feasibility across a variety of datasets and models, including both vision-LLMs like CLIP and LLMs. Through extensive experiments and ablation studies, the research demonstrates how such attacks can significantly increase the success rate of membership inference attacks in a stealthy manner. The critical contributions and findings can be summarized as follows:

Privacy Backdoor Attack Concept: The proposed black-box attack method introduces an insidious way to enhance privacy leakage by injecting a backdoor into a pre-trained model, which, when fine-tuned with private data, substantially leaks details about the data.
Experimental Validation: Experiments across diverse datasets and model architectures confirm the broad applicability and effectiveness of the proposed attack.
Ablation Studies: Detailed analyses underline the nuanced dynamics of the attack's efficiency, relating to different fine-tuning methods and inference strategies.
Implications and Future Directions: Highlighting a critical privacy concern, the paper prompts a reevaluation of the safety protocols surrounding the use of open-source pre-trained models and suggests areas for future research in defending against such vulnerabilities.

Deeper Insights

Using a black-box approach, the paper delineates a scenario where an adversary uploads a poisoned pre-trained model. Unwitting victims who fine-tune this model on their private datasets inadvertently make their data susceptible to heightened privacy breaches. The novel aspect of the attack lies in manipulating the loss associated with specific target data points during the pre-training phase, making subsequent membership inference attacks highly effective.

Experimental Results

The paper reports stark improvements in the membership inference attack's success rates after employing the privacy backdoor, across various datasets and models. For instance, in vision models like CLIP, the True Positive Rate (TPR) at 1% False Positive Rate (FPR) sees a substantial increase, vividly demonstrating the potency of the attack methodology. Even in LLMs, with adjustments in the attack strategy suitable for text data, there's a marked amplification in privacy leakage, confirming the flexibility and scalability of the proposed method.

Ablation Studies

The analysis further ventures into exploring different fine-tuning methods and inference strategies to gauge their impact on the attack's efficacy. Innovative fine-tuning methodologies such as Linear Probing, LoRA, and Noisy Embeddings, and various inference strategies including model quantization and watermarking, are scrutinized. These studies shed light on the nuanced factors influencing the success rates of privacy backdoors, providing valuable insights into potential defense mechanisms.

Conclusion and Looking Ahead

The research unearths noticeable privacy vulnerabilities tied to the popular practice of utilizing pre-trained foundation models, specifically unveiling how adversaries can exploit these models to orchestrate potent privacy backdoor attacks. With the potential for widespread implications across numerous applications and industries reliant on machine learning models, this paper serves as a critical call to action for the community to devise and implement robust security measures safeguarding against such privacy breaches. Future explorations may revolve around novel defense mechanisms, enhanced transparency in model sharing platforms, and a reimagined framework for securely leveraging the prowess of pre-trained models.

PDF Markdown Bookmark Chat (Pro)

References (50)

Authors (6)

Yuxin Wen (33 papers)
Leo Marchyok (1 paper)
Sanghyun Hong (38 papers)
Jonas Geiping (73 papers)
Tom Goldstein (226 papers)
Nicholas Carlini (101 papers)

Citations (9)

View on Semantic Scholar

Tweets

https://twitter.com/ywen99/status/1775529837391691892