Estimating near-verbatim extraction risk in language models with decoding-constrained beam search

Published 26 Mar 2026 in cs.CL and cs.LG | (2603.24917v1)

Abstract: Recent work shows that standard greedy-decoding extraction methods for quantifying memorization in LLMs miss how extraction risk varies across sequences. Probabilistic extraction -- computing the probability of generating a target suffix given a prefix under a decoding scheme -- addresses this, but is tractable only for verbatim memorization, missing near-verbatim instances that pose similar privacy and copyright risks. Quantifying near-verbatim extraction risk is expensive: the set of near-verbatim suffixes is combinatorially large, and reliable Monte Carlo (MC) estimation can require ~100,000 samples per sequence. To mitigate this cost, we introduce decoding-constrained beam search, which yields deterministic lower bounds on near-verbatim extraction risk at a cost comparable to ~20 MC samples per sequence. Across experiments, our approach surfaces information invisible to verbatim methods: many more extractable sequences, substantially larger per-sequence extraction mass, and patterns in how near-verbatim extraction risk manifests across model sizes and types of text.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper presents DCBS as a deterministic algorithm for estimating near-verbatim extraction risk in language models.
It integrates beam search with edit-distance constraints to efficiently capture risk beyond strict verbatim matches.
Experimental results reveal that near-verbatim risk estimates significantly exceed traditional verbatim extraction, indicating heightened privacy and copyright concerns.

Estimating Near-Verbatim Extraction Risk in LLMs with Decoding-Constrained Beam Search

Introduction and Motivation

The assessment of memorization and data extraction risks in LLMs has become an essential area of study, particularly given the increasing deployment of open-weight and production-grade models across domains where privacy and copyright are significant. Prior approaches have focused primarily on verbatim extraction: determining the ability to reproduce training data exactly or with deterministic greedy decoding. However, these approaches can both underestimate the extent and risk profile of memorization. Notably, near-verbatim extraction—where small edits, insertions, or deletions separate model output from the training target—can be just as problematic for privacy or copyright, yet is missed by standard methods.

This paper systematically addresses the measurement of near-verbatim extraction risk. It presents a family of algorithms, collectively termed decoding-constrained beam search (DCBS), that yield efficient and deterministic lower bounds on the probability with which a model, given a prefix, can generate a continuation within a bounded edit distance (e.g., Levenshtein or Hamming) from the original training suffix. The approach leverages beam search with per-decoding-step constraints derived from the target decoding policy (e.g., top- $k$ ) and integrates editing-distance–based pruning to improve both efficiency and the tightness of risk estimates.

Formalization of Extraction Risk: Verbatim vs. Near-Verbatim

Standard verbatim extraction computes the likelihood $p_z$ of generating the exact target suffix $z$ under a deterministic or stochastic decoding policy $\phi$ , typically by:

$p_z \coloneqq \Pr_{\theta,\phi}(z \mid x)$

where $x$ is the context prefix, $\theta$ the LLM parameters, and $\phi$ is a next-token sampling strategy (e.g., greedy, top- $k$ ).

This formulation does not capture the risk associated with high-probability variants of $z$ that are only a small number of substitutions, insertions, or deletions away from the target—the “near-verbatim” risk domain. To address this, the paper generalizes the risk metric to the total probability mass assigned to an $p_z$ 0-ball about $p_z$ 1 for some token-level distance function (e.g., Levenshtein):

$p_z$ 2

where $p_z$ 3 is the set of continuations within edit distance $p_z$ 4 of $p_z$ 5. This re-scopes extraction measurement to align with practical privacy and legal risk—since non-verbatim copies can carry similar liability.

Computational Challenges and Probabilistic Extraction

A major obstacle is the combinatorial growth of $p_z$ 6. Enumerative approaches are intractable even for moderate $p_z$ 7 and sequence lengths due to the exponential candidate space. Monte Carlo (MC) sampling is statistically unbiased but becomes computationally prohibitive, as the probability of observing even a single near-verbatim match can require $p_z$ 8– $p_z$ 9 samples, especially for low-threshold risk detection (e.g., $z$ 0).

Figure 1: Monte Carlo estimation of near-verbatim extraction probability demonstrates that DCBS yields reliable lower bounds orders of magnitude faster.

To mitigate this, the paper introduces an algorithmic framework that enables efficient, deterministic lower bound estimation: decoding-constrained beam search.

Decoding-Constrained Beam Search (DCBS): Algorithmic Framework

The key observation is that high-probability continuations—especially those resulting from memorization—tend to be concentrated in a small region near the training target in the decoding tree. DCBS adapts beam search by restricting expansions to the top- $z$ 1 tokens at each step (matching the decoding scheme of interest, e.g., top- $z$ 2). The algorithm applies distance-based viability pruning, ensuring only candidate continuations that can potentially attain distance $z$ 3 are retained. Key properties include:

Deterministic Lower Bound: All final retained completions within the $z$ 4-ball contribute to a provable, deterministic lower bound on the near-verbatim probability mass. DCBS typically achieves bounds capturing $z$ 5 of the MC estimate at less than $z$ 6 of the sampling cost.
Incremental Pruning: For Hamming, a token mismatch counter per beam element suffices; for Levenshtein, a banded Wagner-Fischer dynamic program tracks the minimal attainable edit distance incrementally, enabling safe early elimination of hopeless candidates.
Scalability: In practice, using beam widths $z$ 7 and standard top- $z$ 8 values, DCBS runs at a computational cost comparable to $z$ 9 MC samples per sequence, several orders of magnitude cheaper than unbiased sampling.
Figure 2: Example of probabilistic extraction for Llama~1-13B on The Great Gatsby under top- $\phi$ 0 decoding; several near-verbatim continuations collectively account for substantially more risk than the verbatim path alone.

Experimental Results

The paper conducts extensive experiments with open LLMs: OLMo 2 (7B, 13B, 32B), Llama 2 (7B, 13B, 70B), and Pythia families, across Wikipedia, Books3 books, and Enron emails. Key findings are as follows:

Extraction Rate Comparison

DCBS reveals that the proportion of training sequences extractable with nontrivial near-verbatim risk is substantially higher than what is measured by verbatim methods. For OLMo 2 32B on held-in Wikipedia, verbatim probabilistic extraction finds 1.42% of sequences extractable; DCBS with $\phi$ 1 finds 2.57%, a near doubling.

Figure 3: Extraction rates increase with both model scale and edit-tolerance; DCBS reveals significantly more risk compared to verbatim or greedy checks.

Per-Sequence Risk and Mass Gain

A large fraction of sequences have their extraction “unlocked” only in the near-verbatim regime—i.e., zero or sub-threshold verbatim risk but high near-verbatim probability. Per-sequence risk can increase from zero to $\phi$ 2 or higher under near-verbatim matching in Llama~2-70B.

Figure 4: For Llama~2 on The Great Gatsby, many sequences have near-verbatim mass well above the verbatim; points above $\phi$ 3 show substantial risk increase.

Empirical risk gain distributions (CCDF) show that for Llama~2-70B, 12.7% of all Gatsby sequences exhibit a mass gain $\phi$ 4—a sharp increase unobservable to classic verbatim analysis.

Figure 5: The CCDF of per-sequence near-verbatim mass gain for Llama~2 confirms systematically increased risk with scale.

Risk Structure and Scaling

The verbatim share of total extraction mass decreases with model scale, especially on structured book text; newly extractable cases tend to be unlocked by DCBS due to increased variation and coverage of near-matches as model parameters grow.

Moreover, inspecting extraction risk across distances reveals that mass can be distributed heterogeneously: for OLMo 2 on Wikipedia, median verbatim share among extractable sequences is often under 5%, with higher near-verbatim risk primarily responsible for extraction.

Negative Controls

Experiments on data not seen during training (Wikipedia pages posted after the OLMo 2 cutoff, or recent books for Llama 2/3) confirm that DCBS does not surface false positives, establishing its conservativeness and reliability as a risk estimator.

Practical and Theoretical Implications

DCBS not only closes a methodological gap by aligning extraction models with real risk (as defined legally and in data privacy), but also offers a practical tool for model assessment and model development workflows. The insights about risk scaling and the structure of memorized output distribution indicate that larger models generalize extraction risk more diffusely, increasing both concern around privacy (private details stored in non-verbatim form) and copyright (substantially similar reproduction). Moreover, the algorithm's efficiency allows its deployment at scale for ongoing model auditing and potentially in the loop during training.

Future Directions

The results open several avenues:

Active Risk Auditing: Use DCBS as part of continuous privacy/copyright-risk assessment for in-training and deployed models.
Understanding Memorization Mechanisms: Further investigation into why and how near-verbatim risk structures emerge with model scaling and cross-dataset heterogeneity.
Extension to Other Decoding Policies: Adapting DCBS to non-top- $\phi$ 5 decoding, such as nucleus sampling, and incorporating other metrics for “semantic” similarity.

Conclusion

This work precisely quantifies the substantial, previously underestimated near-verbatim extraction risk in LLMs, via an efficient, deterministic, and scalable algorithmic framework. DCBS reveals that classical verbatim approaches significantly underrepresent both the prevalence and the magnitude of memorization risk, particularly as models and datasets scale. The methods introduced provide both a robust practical risk diagnostic and a theoretical lens on the evolution of memorization behavior in generative models.

Refer to (2603.24917) for further implementation and experimental details.

Markdown Report Issue