Feasibility of extracting copyrighted training data from production LLMs
Determine whether extraction of substantial amounts of copyrighted book text, comparable to what has been demonstrated for open-weight models, is feasible from production large language models despite the presence of model- and system-level safety measures that are intended to prevent verbatim reproduction of training data.
Sponsor
References
However, it remains an open question if similar extraction is feasible for production LLMs, given the safety measures these systems implement.
— Extracting books from production language models
(2601.02671 - Ahmed et al., 6 Jan 2026) in Abstract