- The paper introduces a technique that embeds privacy backdoors by manipulating model weights to capture sensitive fine-tuning data.
- The method bypasses differential privacy measures, enabling the successful reconstruction of specific input samples.
- The research calls for enhanced model vetting and stronger safeguards to secure the ML supply chain against covert data theft.
Privacy Backdoors in LLMs
Overview
Recent advancements in ML have popularized the practice of sharing large pretrained models across various domains. While this trend fosters collaboration and innovation, it also introduces a potential vector for privacy attacks through the manipulation of a model's weights. This work by Shanglun Feng and Florian Tramer at ETH Zurich explores the concept of privacy backdoors in pretrained models. By subtly altering a model's weights before it is fine-tuned on sensitive data, an attacker can construct "data traps" that capture and encode details from the finetuning dataset directly into the model's weights. These traps are designed to survive through the finetuning process, enabling the extraction of private data post-training. The method is applicable to a range of models, including Multilayer Perceptrons (MLPs) and transformers like ViT and BERT, underscoring its broad impact.
Key Findings
- Designing Privacy Backdoors: The authors present a technique for embedding privacy backdoors into models by manipulating weights. These backdoors are activated by specific data samples during finetuning, capturing information from these samples directly into the model weights.
- Robustness to Differential Privacy (DP): The technique effectively undermines models trained with differential privacy mechanisms, revealing that common practices offering loose privacy guarantees are insufficient against backdoored models.
- Reconstruction of Finetuning Samples: Experiments demonstrate the successful reconstruction of input samples from finetuned models, challenging the assumption of security in using pretrained models from untrusted sources.
Implications
This research introduces a paradigm where the integrity of a model's supply chain is critical for its privacy assurance. The ability of backdoored models to bypass differential privacy measures and extract finetuning data presents a significant threat model that demands attention. For practitioners and researchers, this underscores the necessity of vetting and possibly sanitizing pretrained models before use, especially in contexts dealing with sensitive information.
Future Directions
The discussion on privacy backdoors opens several avenues for future work, including:
- Detection and Mitigation: Developing techniques to detect and mitigate privacy backdoors in pretrained models could help secure the ML supply chain.
- Broader Application: Exploring the applicability of privacy backdoors in other types of models and finetuning approaches can expand our understanding of these vulnerabilities.
- Enhancing Differential Privacy: Investigating methods to strengthen differential privacy mechanisms against backdoored models might offer more robust protection against privacy attacks.
Conclusion
The introduction of privacy backdoors in pretrained models by Feng and Tramer represents a critical shift in the discourse around model integrity and privacy. As the adoption of pretrained models becomes more entrenched in the ML workflow, understanding and mitigating the risks posed by such backdoors become paramount. Their work serves as both a warning and a call to action for the ML community, emphasizing the need for vigilance and improved safeguards in the era of shared model architectures.