Ability of LLMs trained on copyrighted books to produce high‑quality, style‑faithful literary text

Determine whether large language models trained on copyrighted books can generate high‑quality literary text that faithfully emulates authors’ distinctive styles and voices without reproducing verbatim content.

Background

The paper investigates whether fine-tuning LLMs on individual authors’ complete works enables non‑verbatim emulation of those authors’ styles and voices at a quality level that readers prefer over expert human writing. The authors frame this as a central uncertainty at the outset, given ongoing legal and ethical debates around training on copyrighted books and market substitution. Their paper compares MFA-trained writers against frontier LLMs (in-context prompting) and author-specific fine‑tuned models, using blind pairwise evaluations by expert and lay readers.

Although the experimental results later in the paper provide evidence on this question, the abstract explicitly identifies the capability of such models to produce high‑quality, style‑faithful literary text as unclear, motivating the empirical investigation.

References

Yet it's unclear whether these models can generate high quality literary text while emulating authors' styles/voices.

— Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers (2510.13939 - Chakrabarty et al., 15 Oct 2025) in Abstract

Ability of LLMs trained on copyrighted books to produce high‑quality, style‑faithful literary text

Background

References

Related Problems