Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

Published 30 Mar 2024 in cs.CR and cs.LG | (2404.00473v1)

Abstract: Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. By tampering with a pretrained model's weights, an attacker can fully compromise the privacy of the finetuning data. We show how to build privacy backdoors for a variety of models, including transformers, which enable an attacker to reconstruct individual finetuning samples, with a guaranteed success! We further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted. Overall, our work highlights a crucial and overlooked supply chain attack on machine learning privacy.

Abstract PDF HTML Upgrade to Chat

Authors (2)

Citations (11)

View on Semantic Scholar

Summary

The paper introduces a technique that embeds privacy backdoors by manipulating model weights to capture sensitive fine-tuning data.
The method bypasses differential privacy measures, enabling the successful reconstruction of specific input samples.
The research calls for enhanced model vetting and stronger safeguards to secure the ML supply chain against covert data theft.

Privacy Backdoors in LLMs

Overview

Recent advancements in ML have popularized the practice of sharing large pretrained models across various domains. While this trend fosters collaboration and innovation, it also introduces a potential vector for privacy attacks through the manipulation of a model's weights. This work by Shanglun Feng and Florian Tramer at ETH Zurich explores the concept of privacy backdoors in pretrained models. By subtly altering a model's weights before it is fine-tuned on sensitive data, an attacker can construct "data traps" that capture and encode details from the finetuning dataset directly into the model's weights. These traps are designed to survive through the finetuning process, enabling the extraction of private data post-training. The method is applicable to a range of models, including Multilayer Perceptrons (MLPs) and transformers like ViT and BERT, underscoring its broad impact.

Key Findings

Designing Privacy Backdoors: The authors present a technique for embedding privacy backdoors into models by manipulating weights. These backdoors are activated by specific data samples during finetuning, capturing information from these samples directly into the model weights.
Robustness to Differential Privacy (DP): The technique effectively undermines models trained with differential privacy mechanisms, revealing that common practices offering loose privacy guarantees are insufficient against backdoored models.
Reconstruction of Finetuning Samples: Experiments demonstrate the successful reconstruction of input samples from finetuned models, challenging the assumption of security in using pretrained models from untrusted sources.

Implications

This research introduces a paradigm where the integrity of a model's supply chain is critical for its privacy assurance. The ability of backdoored models to bypass differential privacy measures and extract finetuning data presents a significant threat model that demands attention. For practitioners and researchers, this underscores the necessity of vetting and possibly sanitizing pretrained models before use, especially in contexts dealing with sensitive information.

Future Directions

The discussion on privacy backdoors opens several avenues for future work, including:

Detection and Mitigation: Developing techniques to detect and mitigate privacy backdoors in pretrained models could help secure the ML supply chain.
Broader Application: Exploring the applicability of privacy backdoors in other types of models and finetuning approaches can expand our understanding of these vulnerabilities.
Enhancing Differential Privacy: Investigating methods to strengthen differential privacy mechanisms against backdoored models might offer more robust protection against privacy attacks.

Conclusion

The introduction of privacy backdoors in pretrained models by Feng and Tramer represents a critical shift in the discourse around model integrity and privacy. As the adoption of pretrained models becomes more entrenched in the ML workflow, understanding and mitigating the risks posed by such backdoors become paramount. Their work serves as both a warning and a call to action for the ML community, emphasizing the need for vigilance and improved safeguards in the era of shared model architectures.

Markdown Report Issue