Feasibility of extracting copyrighted training data from production LLMs

Determine whether extraction of substantial amounts of copyrighted book text, comparable to what has been demonstrated for open-weight models, is feasible from production large language models despite the presence of model- and system-level safety measures that are intended to prevent verbatim reproduction of training data.

Background

The paper studies memorization and extraction in LLMs, noting prior results that show substantial near-verbatim extraction of copyrighted books from open-weight, non-instruction-tuned models. In contrast, production systems deploy both model-level alignment and system-level guardrails designed to prevent recitation of copyrighted materials.

The authors motivate their work by highlighting uncertainty about whether production LLMs, which implement safety measures such as refusals and filtering, can still be induced to output memorized copyrighted text near-verbatim. They propose and evaluate a two-phase procedure—an initial probe (with Best-of-N jailbreaks where needed) followed by iterative continuation prompts—to test feasibility across four production LLMs.

References

However, it remains an open question if similar extraction is feasible for production LLMs, given the safety measures these systems implement.

Extracting books from production language models (2601.02671 - Ahmed et al., 6 Jan 2026) in Abstract