Quantifying Privacy Risks of Masked LLMs Using Membership Inference Attacks
This paper investigates the capability of masked LLMs (MLMs) to memorize and inadvertently reveal sensitive information from their training data. The focus is on the susceptibility of these models to membership inference attacks (MIAs), which aim to deduce whether a specific data sample was part of the model's training dataset.
Core Contributions
The authors identify that prior attempts at quantifying privacy risks in MLMs have been inconclusive. Previous studies primarily relied on the model's loss function to determine membership, potentially underestimating the vulnerability of these models. The research presented introduces a more robust approach by incorporating likelihood ratio hypothesis testing. This method uses an additional reference model to enhance the predictions of MIAs by accounting for the intrinsic complexity of each data sample.
Methodology
The proposed method involves the following:
- Likelihood Ratio Test: The authors leverage a likelihood ratio test by comparing the likelihood of the target sample being generated by the MLM against a reference model trained on the same distribution without including the sample. This ratio aims to eliminate variations caused by sample complexity.
- Energy-Based Models: To execute this strategy, MLMs, typically not probabilistic by nature, are viewed as energy-based models where likelihood calculations are feasible. This formulation allows the authors to conduct effective membership inference attacks.
The experimental setup focuses on ClinicalBERT models trained on sensitive medical datasets (MIMIC-III), with performance evaluations made against medical datasets from i2b2. The reference model is a domain-specific BERT variant trained on PubMed data to reflect the general training data distribution without using data from the target model's training set.
Results
The paper reports a significant improvement in the Area Under the Receiver Operating Characteristic Curve (AUC) when employing the proposed likelihood ratio test. For instance, the AUC improved from 0.66 to 0.90 when dealing with ClinicalBERT models trained on medical datasets. Furthermore, in highly stringent false positive scenarios, the likelihood ratio method outperformed previous baselines by a magnitude of up to 51 times. The results indicate that MLMs, contrary to earlier conclusions, are profoundly susceptible to MIAs, especially when subjected to robust attack methodologies like the one proposed.
Implications and Future Directions
Highlighting the inherent privacy risks within MLMs, the paper underscores the necessity for developing robust privacy-preserving training mechanisms. Given the paper's conclusive evidence on the susceptibility of MLMs to privacy breaches, future research must focus on integrating differential privacy techniques and robust anonymization strategies in the training pipeline. Moreover, there is a need for comprehensive auditing frameworks to continuously assess and mitigate privacy risks as LLMs grow in size and application.
By improving our understanding of the privacy vulnerabilities inherent in LLMs, this research lays the foundation for developing safer and more reliable NLP systems, particularly in domains handling sensitive personal information such as healthcare and finance.