Papers
Topics
Authors
Recent
Search
2000 character limit reached

MEMORISE Project: Quantitative LLM Memorization Analysis

Updated 18 January 2026
  • MEMORISE Project is a research initiative that rigorously quantifies memorization phenomena in LLMs using membership inference, causal estimation, and low-rank adaptation analysis.
  • It employs advanced techniques such as ROC AUC evaluations, longest common substring metrics, and difference-in-differences estimators to detect privacy risks including verbatim reproduction.
  • It provides actionable guidelines through strategies like differential privacy, data deduplication, and curriculum control to balance model performance with privacy risk management.

The MEMORISE Project is a research initiative focused on principled, quantitative assessment of memorization phenomena in LLMs, specifically addressing the privacy and generalization risks associated with training on sensitive data. MEMORISE develops rigorous metrics and monitoring methodologies to characterize, detect, and mitigate memorization, providing actionable guidance for responsible fine-tuning and deployment of LLMs in domains such as medicine and pharmacovigilance. The project builds on advances in membership inference, generative attacks, causal estimation via difference-in-differences designs, and low-rank adaptation analysis (Savine et al., 28 Jul 2025, Lesci et al., 2024).

1. Definitions and Metrics for Memorization

MEMORISE operationalizes memorization by combining attack-based and causal inference perspectives. The project employs two principal strands:

  • Membership Inference Attacks: These detect whether a given sequence was present in the model’s training set by comparing the relative likelihoods assigned by fine-tuned versus reference models. The likelihood ratio LR(x)=PrR(x)/PrM(x)LR(x) = \Pr_R(x)/\Pr_M(x) forms the core discriminant, with thresholds calibrated to control the false-positive rate on held-out data. The primary evaluation metric is ROC AUC (area under the receiver operating characteristic curve), which quantifies the model’s ability to distinguish between members and non-members (Savine et al., 28 Jul 2025).
  • Verbatim Reproduction Detection: A prefix-suffix "canary" attack prompts the model to generate continuations and tests for high-overlap substrings with original training suffixes, using the longest common substring (LCS) normalized by target length. Precision and recall metrics track exact suffix reproductions, complemented by the share-of-n-grams score for partial memorization (Savine et al., 28 Jul 2025).
  • Causal Effect Measure: Drawing from the Rubin–Neyman potential-outcomes framework, memorization is defined as the causal effect of exposure to each training instance on its log-likelihood under model parameters. Average treatment effects on the treated (ATTs) are computed via difference-in-differences (DiD) estimators, comparing observed and "would-be" unexposed outcomes across training checkpoints (Lesci et al., 2024).

2. Mechanisms Driving Memorization in LLMs

MEMORISE dissects transformer architectures to identify matrix components most responsible for memorization during low-rank adaptation (LoRA) fine-tuning:

  • Component-Wise Effects: Systematic experiments isolating adaptation to attention projection matrices reveal that Value (WVW^V) and Output (WOW^O) matrices drive substantially higher membership-inference risk than Query (WQW^Q) or Key (WKW^K) matrices (see Table 1 below). Jointly adapting [WQ,WK][W^Q,W^K] at low rank can outperform individually higher-rank adaptations in these components, indicating nontrivial interplay in projection subspaces (Savine et al., 28 Jul 2025).
Adapted Matrix Rank=1 ROC AUC Rank=2 ROC AUC Rank=4 ROC AUC
WQW^Q 0.68 ± 0.02 0.70 ± 0.02 0.72 ± 0.02
WKW^K 0.68 ± 0.02 0.71 ± 0.02 0.71 ± 0.02
WVW^V 0.80 ± 0.01 0.82 ± 0.01 0.83 ± 0.01
WOW^O 0.77 ± 0.02 0.79 ± 0.01 0.80 ± 0.01
[WQ,WK][W^Q,W^K] 0.76 ± 0.02 0.77 ± 0.02 0.79 ± 0.01

This analysis establishes a robust link between model capacity adaptation loci and memorization phenomena.

3. Relation Between Perplexity, Model Performance, and Memorization

A strong empirical correlation emerges between improved downstream model performance (measured by perplexity) and heightened memorization risk:

  • Perplexity Link: Lower perplexity in the fine-tuned model correlates with significantly increased membership inference AUC and with the model’s ability to output verbatim training data in generation attacks. The base model’s perplexity does not predict memorization in the fine-tuned model, signifying that the effect is induced by the adaptation process (Savine et al., 28 Jul 2025).
  • LoRA Rank Effects: Increasing the LoRA rank rr systematically raises memorization risk, with ROC AUC rising rapidly at low values of rr and then plateauing (diminishing returns) around r50r\approx50. This suggests an intrinsic low dimension to the memory-storing subspace, with practical calibration focusing on the “knee” of the performance-memorization curve for optimal privacy-utility trade-off (Savine et al., 28 Jul 2025).

4. Causal Profiling and Temporal Dynamics

MEMORISE integrates causal profiling to chart memorization dynamics throughout training:

  • Difference-in-Differences Memorization Profiles: By tracking the log-likelihoods of held-in and held-out items across checkpoints, MEMORISE generates a heatmap of instantaneous, persistent, and residual memorization. Instantaneous memorization peaks shortly after learning rate warm-up; persistent memorization endures for tens of thousands of steps in larger models before plateauing; residual memorization at epoch end is dominated by most recently seen examples, linked to learning rate decay (Lesci et al., 2024).
  • Scaling Trends: Memorization magnitudes and persistence rise monotonically with model size (negligible in 70M, plateauing above 410M parameters). Profiles are highly correlated across model sizes r0.8r\ge0.8, enabling transfer of risk inferences from smaller to larger models without full retraining (Lesci et al., 2024).
  • Data Ordering and Curriculum Effects: Earlier data introduction induces more durable memorization, while late-stage examples are more likely to remain memorized after training due to small learning rate and rapid "locking in." This suggests actionable control over memorization by varying input order and learning rate schedule (Lesci et al., 2024).

5. Mitigation Strategies and Policy Recommendations

MEMORISE synthesizes a set of actionable guidelines and mitigation techniques to balance model utility with privacy risk:

  • Differential Privacy via DP-SGD: Injects noise during fine-tuning to bound instance-level memorization, with explicit tracking of (ϵ,δ)(\epsilon,\delta) privacy budgets (Savine et al., 28 Jul 2025).
  • Dataset Deduplication: Filtering repeated or near-duplicate records reduces frequency-driven memorization, directly lowering the risk of verbatim reproduction (Savine et al., 28 Jul 2025).
  • Generation-Time Defenses: Techniques such as enforcing minimum sampling temperatures or post-generation filtering (e.g., LCS-based thresholds) further reduce unsafe outputs at inference time (Savine et al., 28 Jul 2025).
  • Component Selection: Adaptation of low-memorization-risk components (e.g., WQW^Q, WKW^K) is prioritized when computational or privacy constraints are present (Savine et al., 28 Jul 2025).
  • Measurement Expansion: Calls for memorization metrics that capture not only verbatim reproduction but also semantic and paraphrastic leakage, and integration of interpretability methods to map "memory" to specific neurons or latent subspaces (Savine et al., 28 Jul 2025).
  • Curriculum and Data-Ordering Control: Sensitive or protected items can be scheduled late (to reduce residual memorization) or deliberately early (to seed long-term memory), offering granular policy control (Lesci et al., 2024).

6. Open Questions and Extensions

The MEMORISE Project identifies several key outstanding challenges:

  • Semantic and Paraphrase Memorization: Current tools mostly detect verbatim overlap; how to rigorously define and measure "meaningful" memorization, encompassing paraphrase and counterfactual learning effects, remains unresolved (Savine et al., 28 Jul 2025).
  • Generalization Across Modalities: Whether difference-in-differences dynamics and parallel trends hold for architectures beyond transformers or for other modalities (vision, speech) is an open research direction (Lesci et al., 2024).
  • Multi-Epoch and Curriculum Learning: The evolution of memorization beyond single-epoch regimes and under complex curriculum strategies is not fully characterized (Lesci et al., 2024).
  • Instance-Level Heterogeneity: "Extreme-tail" instances may be memorized or forgotten at rates not captured by cohort averages; finer-grained profiles are needed for robust risk management (Lesci et al., 2024).
  • Unified Privacy-Memorization Score: Integration of DiD-based memorization profiling with membership inference to yield comprehensive privacy risk scores is an active area for methodology development (Lesci et al., 2024).

7. Significance and Impact

The methodologies and findings of the MEMORISE Project provide an operational toolkit for the principled control of memorization in LLMs. By delivering causally sound metrics, empirical scaling laws, and actionable mitigation strategies, MEMORISE advances the capability to audit, govern, and optimize LLM training pipelines—especially for privacy-sensitive domains. Its emphasis on monitoring memorization throughout training, rather than relying solely on post hoc generation analysis, marks a transition toward real-time, transparent, and risk-aware LLM deployment (Savine et al., 28 Jul 2025, Lesci et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MEMORISE Project.