Sampling-based Pseudo-Likelihood for Membership Inference Attacks (2404.11262v1)

Published 17 Apr 2024 in cs.CL

Abstract: LLMs are trained on large-scale web data, which makes it difficult to grasp the contribution of each text. This poses the risk of leaking inappropriate data such as benchmarks, personal information, and copyrighted texts in the training data. Membership Inference Attacks (MIA), which determine whether a given text is included in the model's training data, have been attracting attention. Previous studies of MIAs revealed that likelihood-based classification is effective for detecting leaks in LLMs. However, the existing methods cannot be applied to some proprietary models like ChatGPT or Claude 3 because the likelihood is unavailable to the user. In this study, we propose a Sampling-based Pseudo-Likelihood (\textbf{SPL}) method for MIA (\textbf{SaMIA}) that calculates SPL using only the text generated by an LLM to detect leaks. The SaMIA treats the target text as the reference text and multiple outputs from the LLM as text samples, calculates the degree of $n$-gram match as SPL, and determines the membership of the text in the training data. Even without likelihoods, SaMIA performed on par with existing likelihood-based methods.

PDF HTML Abstract

Exploring Leakage Detection in LLMs Through Sampling-based Pseudo-Likelihood

Introduction to SaMIA

LLMs, due to their vast and diverse training datasets, are susceptible to unintended memorization, potentially causing the leakage of sensitive or proprietary information. This paper introduces a novel method, Sampling-based Pseudo-Likelihood (SPL) for Membership Inference Attacks (MIA), referred to as SaMIA. This method is particularly significant as it operates under conditions where traditional likelihood-based MIA methods falter -- specifically, it does not require access to the model's internal likelihood calculations, extending its applicability to proprietary models like ChatGPT or Claude 3.

Mechanisms of SaMIA

SaMIA operates by generating multiple text outputs from a provided text prefix using an LLM, and then comparing these generated texts against a reference text. The reference text is part of the original input text, and the comparison focuses on the overlap of $n$ -grams between the generated and reference texts. The degree of this overlap, calculated through ROUGE-N metrics, forms a pseudo-likelihood indicator of whether the reference text was part of the model’s training dataset:

Text Splitting: Each text is split into a prefix and a reference segment.
Text Generation: Multiple continuations are generated from the prefix.
Overlap Calculation: The overlap of $n$ -grams between generated texts and the reference segment is computed.

Key Results

Experimental evaluation across several public LLMs demonstrated that SaMIA achieves comparable, if not superior, performance to existing likelihood or loss-based MIA methods. The paper notably highlights the efficacy of SaMIA in scenarios where the likelihood is inaccessible, providing a robust alternative to traditional MIA techniques.

Performance Metrics: Utilizing metrics like AUC and TPR@10%FPR, SaMIA displayed strong performance across various models and settings.
Comparison with Existing Methods: In many cases, SaMIA outperformed or matched state-of-the-art MIA methods that rely on internal model probabilities.

Theoretical and Practical Implications

On a theoretical level, SaMIA enriches the toolkit for studying model leakage in settings void of direct model introspection capabilities. Practically, it offers a method to audit LLMs for potential data leakage without needing proprietary information about the model. This could be particularly valuable for companies and developers seeking to ensure that their models comply with privacy regulations and intellectual property rights.

Future Directions

While SaMIA represents a substantial advancement in the MIA landscape, the paper suggests several avenues for future research. Enhancing the methodology to further minimize the dependency on model outputs and refining its applicability to different domains and models could provide deeper insights and broader applicability. Additionally, exploring the integration of other statistical comparison measures beyond unigram and bigram matches could offer improvements in detection sensitivity and specificity.

Conclusion

The introduction of SaMIA provides a significant step toward more versatile and accessible means of detecting data leakage in LLMs, particularly in environments where traditional methods are not applicable. Its ability to work without internal model data makes it a promising tool for a wide array of applications in data security and model auditing.