N-Gram Memorisation Score
- N-gram memorisation score is a metric that quantifies the partial reproduction of training data by comparing contiguous token sequences in model outputs with the ground-truth.
- It detects early memorisation trends during fine-tuning by measuring rises in overlapping n-grams before complete verbatim copying occurs.
- Its simple, scalable implementation allows integration into large-scale pipelines and supports parameter tuning to balance privacy protection with model performance.
The n-gram memorisation score quantifies the extent to which a LLM reproduces partial subsequences (n-grams) from training data during fine-tuning or inference. Rather than relying on verbatim copying of complete suffixes, this metric evaluates the overlap between short contiguous tokens (e.g., 4-grams, 5-grams, 6-grams) found in the model’s outputs and the ground-truth sequence, providing a fine-grained and early measurement of memorisation dynamics. It is applied across various LLM architectures and training scenarios, providing both diagnostic and mitigation functions in domains sensitive to data privacy.
1. Definition and Calculation
The n-gram memorisation score is formally defined for a dataset as the mean fraction of matching n-grams of specified lengths between a model output (where is the prompt) and the reference target sequence . For a single data point ,
The aggregate score across is
This value captures partial memorisation (matching n-grams) rather than waiting for complete verbatim reproduction as with extraction metrics. The choice of (e.g., 4–6) is domain-dependent and determines the granularity and sensitivity of the metric.
2. Role in Early Detection of Memorisation
Empirical observations indicate that the n-gram memorisation score increases in the epochs leading up to full memorisation of specific training samples. In fine-tuning, this rise in score reliably precedes verbatim copying. Consequently, the metric is used as an early warning signal—by monitoring when the score first exceeds a domain-appropriate threshold (such as 20%), practitioners can halt training preemptively.
This early-stopping strategy outperforms methods that rely on final validation perplexity or evaluation metrics alone, yielding a significantly lower rate of verbatim memorisation with minimal sacrifice of downstream performance.
3. Implementation and Computational Efficiency
The method is computationally simple. For each output, substrings of the desired n-gram length are extracted using a sliding window, and matched exactly against the corresponding substrings in the ground-truth sequence. For aggregate operation across a dataset, results are averaged. The simplicity and scalability of this computation facilitate efficient integration into large-scale fine-tuning pipelines, with parameterisable to match application-specific memorisation sensitivities.
4. Regularisation via n-gram-aware Loss
Beyond use as an early-stopping criterion, the n-gram memorisation score can inform loss regularisation. An n-gram-aware regulariser penalises the model whenever it assigns excessive probability to particular n-grams versus a pre-trained baseline. In implementation, this acts as an additive term in the objective, actively discouraging the model from memorising token sequences beyond what is already captured by its pre-training distribution. Empirical results show up to 40% reduction in memorisation with regularisation, and a 35% smaller trade-off in evaluation performance compared to legacy mitigation approaches.
5. Comparative Advantages and Limitations
Advantages
- Granularity: Detects partial memorisation before full copying occurs, alerting researchers to memorisation “incipience.”
- Scalability: Efficient, straightforward computation for large datasets and model families.
- Practical Flexibility: Parameterisable n-gram length allows control over sensitivity and specificity for different tasks and risk profiles.
Limitations
- False Positives: In domains with frequent stock phrases or templated language, legitimate repetition may inflate the score.
- Sensitivity to : Shorter n-grams may overestimate memorisation due to common language patterns; longer n-grams offer stricter detection but may miss early phenomena.
6. Experimental Findings
Experiments across Pythia, Llama3, and Mistral (1.4B–70B parameters), evaluated on summarisation, QA, and instruction-tuning data, demonstrate:
- The n-gram memorisation score rises sharply just prior to full memorisation.
- Early-stopping using n-gram score (e.g., “Best n-gram” criterion) reduces memorisation rates significantly versus selection by perplexity or evaluation accuracy.
- n-gram regularisation achieves up to 40% mitigation of memorisation with only minor performance degradation.
- Predictive gaps in n-gram memorisation score consistently distinguish examples destined for memorisation from those not, across datasets and model families.
7. Practical Implications and Customisation
The n-gram memorisation score provides a practical defence against privacy risks where fine-tuned models might leak sensitive or copyrighted material. It enables both continuous monitoring and dynamic intervention during fine-tuning. Parameter selection ( and score thresholds) can be tuned to balance memorisation mitigation against performance retention, and the measure is robust enough for deployment at scale across heterogeneous data regimes and model architectures.
In summary, the n-gram memorisation score is a scalable, interpretable, and effective metric for early detection and reduction of memorisation during LLM fine-tuning, supporting privacy-conscious deployment without substantial compromise of model utility (Slack et al., 13 Oct 2025).