Training Data Memorization

Updated 23 January 2026

Training data memorization is the phenomenon where models recall specific training examples beyond population-level statistics, indicating overfitting that defies expected correlations.
One-model reference techniques using lightweight classifiers offer scalable, efficient estimation of memorization without the need for multiple retraining cycles.
Empirical findings suggest that larger, diverse datasets reduce memorization rates, aiding privacy audits and enhancing our understanding of model generalization.

Training data memorization refers to the phenomenon where machine learning models, particularly deep neural architectures and large-scale self-supervised encoders, internalize aspects of their training data such that they can reproduce or “recall” individual examples far beyond what is predictable from population-level (dataset) correlations or generalization. This behavior is of central interest due to its implications for generalization theory, information retention, privacy leakage, and fair representation learning across diverse architectures and modalities, including vision, vision–language, and text.

1. Formal Definitions of Memorization

Precise quantification of memorization in representation learning hinges on the distinction between prediction achievable via dataset-level correlations and genuine instance-level recall enabled by overfitting. For a training set $D = \{z_1, \dots, z_n\}\sim\mu^n$ , with each sample $z = (v, t)$ split into context $v$ and target $t$ , Déjà vu memorization occurs if the model can correctly predict $t$ from $v$ in a way that cannot be explained by the empirical distribution $\mu(t|v)$ :

$\text{Déjà vu Memorization: } z \text{ is memorized if } \exists h: h(f, v) = t \quad \text{and} \quad \arg\max_{t'}\mu(t' \mid v) \neq t$

where $f$ is the learned encoder, and $h$ is a decoding function (Kokhlikyan et al., 8 Apr 2025).

The above strictly excludes cases where $t$ is simply the most likely label for $v$ given dataset statistics, ensuring that declared memorization reflects true instance-wise overfitting and not mere exploitation of frequent patterns.

An alternative, stability-based definition (Feldman 2020) measures memorization of $z_i = (v_i, t_i)$ as the difference in prediction when the sample is excluded:

$h(f_D, v_i) = t_i \ \text{and} \ h(f_{D \setminus z_i}, v_i) \neq t_i$

This provides a leave-one-out stability test but is computationally infeasible for large models due to retraining costs.

Dataset-level correlations can be operationalized by the Bayes-optimal correlate-only predictor:

$t_{\mathrm{corr}}(v) = \arg\max_{t'} \mu(t' \mid v)$

and only points for which the model’s correct prediction exceeds this baseline count as “memorized.”

2. Efficient Measurement Methodologies

Original methods required two disjointly-trained models: one to measure the model-internal correlations (model $f_A$ ), and another trained on held-out data as a reference (model $f_B$ ) to estimate dataset-intrinsic correlations (Kokhlikyan et al., 8 Apr 2025). The Déjà vu memorization score is then:

$\mathrm{DV}(f_A) = \frac{1}{|A|}\sum_{z_i\in A} \left[ \mathsf{acc}_{f_A}(v_i, t_i) - \mathsf{acc}_{f_B}(v_i, t_i) \right]$

This approach, though precise, is impractical for large pre-trained models due to the need for multiple expensive training runs.

To enable scalable, one-pass memorization estimation, a one-model reference is introduced:

Image-only Encoders: Instead of retraining, a lightweight ResNet-50 classifier ( $g_{\mathrm{Res}}$ ) or a Naive Bayes model (on object-detection outputs) is trained on a small held-out subset to estimate population-level correlations. Memorization is identified when:

$m_i = \mathbf{1}\left[\hat t^{(\mathrm{SSL})}_i = t_i \wedge \hat t^{(\mathrm{ref})}_i \neq t_i\right]$

$\mathrm{DV} = \frac{1}{N}\sum_{i=1}^N m_i, \qquad \mathrm{DV}_{p\%} = \frac{1}{\lfloor pN/100 \rfloor}\sum_{i\in\text{top-}p\%} m_i$

where $\hat t^{(\mathrm{SSL})}_i$ is the model’s predicted label, and $\hat t^{(\mathrm{ref})}_i$ is the reference classifier’s prediction.

Vision–Language Encoders: For image–text models, a frozen LLM (e.g., GTE) is used as the reference to perform text–text nearest neighbor search in a large public caption dataset. A predicted object is considered memorized if it only appears in the model’s output, not in the retrieved nearest captions.

Assumptions for these methods include negligible memorization by the reference classifier and reasonable independence in Naive Bayes, both supported by empirical validation.

3. Practical Pipeline for Open-Source Models

Application of the one-model Déjà vu test proceeds as follows:

For encoders over images, prepare a moderate reference set $D'$ , train a reference model, compute SSL model representations and predictions, then compare predictions to reference outcomes to tally memorize-only hits.
For vision–language encoders, annotate public captions for object classes, retrieve k-nearest neighbors for each caption, and compare the model’s image–text predictions against the reference annotation set.

Hyperparameters include k-nearest neighbor size (commonly 100 for images), detection thresholds, and selection of top-k objects for Naive Bayes or NE retrieval.

This method, for the first time, enables memorization measurement in large, pre-trained open-source encoders without retraining, such as those trained on complete ImageNet or YFCC15M.

4. Empirical Findings and Population-Level Trends

Key results substantiate the validity and scalability of the one-model memorization estimator:

Across both VICReg, Barlow Twins, and DINO self-supervised models, as well as open-source encoders, one-model approaches (ResNet, Naive Bayes, LLM text) yield nearly identical aggregate memorization scores to the two-model reference benchmarks (e.g., 10.4–11.3% on 300K subset for images; PPG of 0.06–0.16 for VLMs).
Population-level correlation accuracies are consistently low (typically 10–15%) across all reference methods, indicating that chance-level prediction dominates for most $v$ under Bayes-optimal strategies, making memorization detection highly reliable.
Models trained on complete datasets (e.g., full ImageNet or 15M YFCC) display predictably lower aggregate memorization than those trained on smaller subsets (e.g., 300K, 40M), highlighting the mitigating effect of data scale and diversity on memorization rates.
Sample-level agreement between reference estimators varies (40–84%), but population-scale metrics and the relative ordering of models by memorization are robust.

Sample-level analysis reveals that high-confidence memorization events are rare, with confidence histograms skewed toward ambiguous cases, suggesting memorization is concentrated among a minority of unique or complex samples.

5. Interpretations, Insights, and Implications

The unified evidence from these measurement frameworks leads to several robust conclusions:

Aggregate memorization rates are largely invariant to the choice of reference estimator—all reasonable methods (SSL KNN, ResNet, NB, LLM) capture the dominant dataset-level correlations and yield commensurate memorization estimates at the population level.
Scale and diversity in open-source training corpora significantly suppress aggregate memorization, even though a nontrivial fraction of instances (5–15%) can still be memorized in very large encoders.
Large models are less susceptible to out-of-distribution memorization than smaller, subset-trained models, reinforcing arguments for scale as a privacy and generalization safeguard.
The methodology provides a practical tool for privacy auditing—the occurrence of memorization, especially on privacy-relevant or copyrighted samples, can be efficiently measured in any released model without requiring privileged retraining access.

This suggests that, while it is not feasible to fully eliminate memorization from over-parameterized models, systematic quantification and mitigation are achievable with lightweight, one-model reference pipelines.

6. Caveats, Limitations, and Further Directions

The reference classifiers (ResNet, NB, LLM) are assumed not to themselves memorize or overfit the held-out reference data, an assumption empirically validated but context-sensitive.
Sample-level discrepancies between estimators highlight potential challenges in samples exhibiting complex or rare correlations, warranting further research into specialized reference modeling for such cases.
The memorization threshold and interpretation are governed by the variability in the reference population; adjusting the balance between type I/II errors for high-confidence detection versus aggregate reporting remains an open methodological design axis.
The privacy risk landscape is not addressed via formal (e.g., differential privacy) guarantees but provides a necessary empirical foundation for such analysis.

Continued research is warranted into both theoretical guarantees for population-level versus pointwise memorization auditing, and expanded application of efficient one-model reference tests across domains outside vision and vision–language encoders.

Key conceptual advances in training data memorization research:

Method/Concept	Definition/Key Formula(s)	Context/Role
Déjà vu Memorization	$z$ is memorized if $h(f, v) = t$ and $\arg\max_{t'}\mu(t' \mid v) \neq t$	Distinguishes instance-level recall from correlation
One-Model Reference	ResNet / NB classifier for $\mu(t \mid v)$	Efficient, scalable estimation for large models
Two-Model Reference	Compare $f_A$ and $f_B$ on disjoint splits; gap = memorization	Previous gold standard for precise evaluation
Population Gap (VLM)	$PPG = \frac{1}{\|A\|}\|\{z: prec(z, f_A) > prec(z, f_B)\}\| - ...$	Measures aggregate memorization in VLMs
Memorization Score (DV)	$\mathrm{DV} = \frac{1}{N} \sum_{i=1}^{N} m_i$	Fraction of truly memorized instances

These developments collectively establish efficient, scalable approaches to quantify training data memorization, reveal the impact of model and dataset scale, and provide a foundation for privacy-focused risk management in large representation models (Kokhlikyan et al., 8 Apr 2025).

Markdown Upgrade to Chat

References (1)

Measuring Déjà vu Memorization Efficiently (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Training Data Memorization.