Measuring memorization in language models via probabilistic extraction
Abstract: LLMs are susceptible to memorizing training data, raising concerns about the potential extraction of sensitive information at generation time. Discoverable extraction is the most common method for measuring this issue: split a training example into a prefix and suffix, then prompt the LLM with the prefix, and deem the example extractable if the LLM generates the matching suffix using greedy sampling. This definition yields a yes-or-no determination of whether extraction was successful with respect to a single query. Though efficient to compute, we show that this definition is unreliable because it does not account for non-determinism present in more realistic (non-greedy) sampling schemes, for which LLMs produce a range of outputs for the same prompt. We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries to quantify the probability of extracting a target sequence. We evaluate our probabilistic measure across different models, sampling schemes, and training-data repetitions, and find that this measure provides more nuanced information about extraction risk compared to traditional discoverable extraction.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Mirostat: A neural text decoding algorithm that directly controls perplexity. arXiv preprint arXiv:2007.14966, 2020.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023.
- Emergent and predictable memorization in large language models. Advances in Neural Information Processing Systems, 36, 2024.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, Mar. 2021. URL https://doi.org/10.5281/zenodo.5297715.
- Elephants never forget: Memorization and learning of tabular data in large language models. arXiv preprint arXiv:2404.06209, 2024.
- Audio chord recognition with recurrent neural networks. In ISMIR, pages 335–340. Curitiba, 2013.
- Spam filtering using statistical data compression models. The Journal of Machine Learning Research, 7:2673–2698, 2006.
- When is memorization of irrelevant training data necessary for high-accuracy learning? In Proceedings of the 53rd annual ACM SIGACT symposium on theory of computing, pages 123–132, 2021.
- The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX security symposium (USENIX security 19), pages 267–284, 2019.
- Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021.
- Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646, 2022.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
- Do membership inference attacks work on large language models? arXiv preprint arXiv:2402.07841, 2024a.
- Uncovering latent memories: Assessing data leakage and memorization patterns in large language models. arXiv preprint arXiv:2406.14549, 2024b.
- Hierarchical neural story generation. arXiv preprint arXiv:1805.04833, 2018.
- V. Feldman and C. Zhang. What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems, 33:2881–2891, 2020.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- A. Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711, 2012.
- The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
- Demystifying verbatim memorization in large language models. arXiv preprint arXiv:2407.17817, 2024.
- Alpaca against vicuna: Using llms to uncover memorization of llms. arXiv preprint arXiv:2403.04801, 2024.
- Madlad-400: A multilingual and document-level large audited dataset. Advances in Neural Information Processing Systems, 36, 2024.
- Deduplicating training data makes language models better. arXiv preprint arXiv:2107.06499, 2021.
- Scaling laws for fact memorization of large language models. arXiv preprint arXiv:2406.15720, 2024.
- An empirical analysis of memorization in fine-tuned autoregressive language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1816–1826, 2022.
- Towards more realistic extraction attacks: An adversarial perspective. arXiv preprint arXiv:2407.02596, 2024.
- Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035, 2023.
- Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024.
- Formalizing human ingenuity: A quantitative framework for copyright law’s substantial similarity. In Proceedings of the 2022 Symposium on Computer Science and Law, pages 37–49, 2022.
- Rethinking llm memorization through the lens of adversarial compression. arXiv preprint arXiv:2404.15146, 2024.
- Detecting pretraining data from large language models. arXiv preprint arXiv:2310.16789, 2023.
- Identifying and mitigating privacy risks stemming from language models: A survey. arXiv preprint arXiv:2310.01424, 2023.
- Beyond memorization: Violating privacy via inference with large language models. arXiv preprint arXiv:2310.07298, 2023.
- Assessing privacy risks in language models: A case study on summarization tasks. arXiv preprint arXiv:2310.13291, 2023.
- Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024.
- Memorization without overfitting: Analyzing the training dynamics of large language models. Advances in Neural Information Processing Systems, 35:38274–38290, 2022.
- Diverse beam search: Decoding diverse solutions from neural sequence models. arXiv preprint arXiv:1610.02424, 2016.
- Unlocking memorization in large language models with dynamic soft prompting. arXiv preprint arXiv:2409.13853, 2024.
- Analyzing information leakage of updates to natural language models. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pages 363–375, 2020.
- Counterfactual memorization in neural language models. Advances in Neural Information Processing Systems, 36:39321–39362, 2023.
- Get confused cautiously: Textual sequence memorization erasure with selective entropy maximization. arXiv preprint arXiv:2408.04983, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.