Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs (2406.10209v2)

Published 14 Jun 2024 in cs.CL

Abstract: LLMs can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, randomly sampled subsets of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces the goldfish loss, a modified training objective that randomly excludes tokens to significantly reduce verbatim memorization.
Experiments with LLaMA-2 models demonstrate that the goldfish loss nearly eliminates reproduced training data, with standard models replicating 84 out of 100 articles versus near zero in the goldfish model.
The study shows that while goldfish loss enhances privacy and data safety without major performance trade-offs, some advanced adversarial methods can still extract memorized content.

Mitigating Memorization in Generative LLMs: An Analysis of the Goldfish Loss

The manuscript "Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs" investigates an alternative training methodology aimed at reducing the memorization of training data by LLMs. The aim is particularly pertinent given the increasing integration of LLMs in commercial applications, where verbatim reproduction of training data raises concerns around privacy, intellectual property, and copyright.

Summary of Contributions

The paper introduces the goldfish loss, a modification of the traditional next-token prediction objective used in LLMs. The goldfish loss works by randomly excluding a subset of tokens from the loss computation during training. Consequently, these excluded tokens are not memorized by the model. The authors hypothesize that this selective forgetting prevents the model from reproducing precise chains of tokens from the training data verbatim, addressing memorization at its source rather than relying on post-hoc model editing or unlearning techniques.

Experimental Setup and Results

The authors conduct extensive experiments using LLaMA-2 models trained both from scratch and from pre-trained checkpoints. The models are tested in scenarios designed to encourage memorization (e.g., training on a limited set of articles for a large number of epochs) as well as under more standard training regimens.

Extreme Memorization Scenarios:
- Setup: The authors train a 7B parameter model on 100 English Wikipedia articles across 100 epochs.
- Outcome: The goldfish loss model memorized significantly fewer articles than the standard model, with the former almost entirely avoiding exact matches while the latter reproduced 84 out of 100 articles verbatim.
Standard Training Regimens:
- Setup: Under a more typical training setup involving 1.1B parameter models and datasets containing 20 billion tokens, including repeated target sequences.
- Outcome: Memorizations, measured using RougeL metrics, were significantly reduced in the goldfish model compared to control models trained on standard causal LLMing (CLM) objectives.

Theoretical and Practical Implications

Prevention of Verbatim Regeneration:
- The goldfish loss effectively curtails the model's ability to produce long-form, verbatim sequences from the training data. This has notable implications for reducing privacy risks and mitigating potential copyright violations, crucial for models deployed in commercial settings.
Performance Impacts:
- The paper shows that the goldfish loss has minimal effect on the downstream performance of models across various benchmarks. This indicates that the model's utility is largely preserved despite the introduction of periodic token masking.
Adversarial Robustness:
- Despite some mitigation of membership inference attacks, sophisticated attacks such as beam search can still extract memorized data, albeit with reduced efficacy. Thus, while the goldfish loss reduces the likelihood of verbatim reproduction, it does not provide absolute security against data extraction attacks.

Analysis of the Method

The goldfish loss is notable for its simplicity and efficacy. By forcing the model to "forget" specific tokens, it significantly reduces the probability of generating long sequences of memorized content. The authors also propose a hashing-based approach to ensure consistent masking of duplicated passages, mitigating the risk of subtle variations bypassing the masking mechanism.

Conclusion and Future Directions

The findings suggest that the goldfish loss can be an effective tool for reducing memorization in LLMs, thus contributing to privacy preservation and compliance with intellectual property laws. Future research could explore:

Scalability: Evaluating the goldfish loss on even larger models and more diverse datasets.
Adaptive Goldfish Loss: Dynamically adjusting the frequency of token masking based on the uniqueness and sensitivity of the training data.
Robust Security Measures: Complementing the goldfish loss with other privacy-preserving techniques such as differential privacy to provide stronger guarantees against adversarial extraction.

In summary, the goldfish loss represents a promising approach in the landscape of LLM training methodologies aimed at balancing the trade-offs between model utility and data privacy. By addressing memorization during the training phase, it sets the stage for more secure and ethically responsible deployment of large-scale generative models.

Related Papers

Tweets

https://twitter.com/tomgoldsteincs/status/1802726878924464273

https://twitter.com/iScienceLuvr/status/1802629872503976057

https://twitter.com/A_K_Nain/status/1809052359533764930

https://twitter.com/omarsar0/status/1802729440163647754

https://twitter.com/fly51fly/status/1802811740830441567

https://twitter.com/jwkirchenbauer/status/1802788924676305272

YouTube

Show All Videos