Frankentext: Stitching random text fragments into long-form narratives

Published 23 May 2025 in cs.CL | (2505.18128v2)

Abstract: We introduce Frankentexts, a new type of long-form narratives produced by LLMs under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from human writings. This task presents a challenging test of controllable generation, requiring models to satisfy a writing prompt, integrate disparate text fragments, and still produce a coherent narrative. To generate Frankentexts, we instruct the model to produce a draft by selecting and combining human-written passages, then iteratively revise the draft while maintaining a user-specified copy ratio. We evaluate the resulting Frankentexts along three axes: writing quality, instruction adherence, and detectability. Gemini-2.5-Pro performs surprisingly well on this task: 81% of its Frankentexts are coherent and 100% relevant to the prompt. Notably, up to 59% of these outputs are misclassified as human-written by detectors like Pangram, revealing limitations in AI text detectors. Human annotators can sometimes identify Frankentexts through their abrupt tone shifts and inconsistent grammar between segments, especially in longer generations. Beyond presenting a challenging generation task, Frankentexts invite discussion on building effective detectors for this new grey zone of authorship, provide training data for mixed authorship detection, and serve as a sandbox for studying human-AI co-writing processes.

Abstract PDF Upgrade to Chat

Summary

Stitching Random Text Fragments into Long-form Narratives: An Overview

The paper introduces Frankentexts, an innovative approach to generating long-form narratives using LLMs under stringent constraints where the majority of the text must be reproduced verbatim from human-written paragraphs. This concept challenges conventional practices by requiring models to adhere to a writing prompt, integrate disparate text fragments, and still produce coherent and relevant outputs.

Methodology and Results

The methodology involves a prompt-based pipeline where LLMs initially draft stories by selecting from a vast corpus of human-written excerpts. Following this, iterative revisions are performed to refine coherence while maintaining the specified copy ratio. Evaluations are conducted on Gemini-2.5-Pro, which shows notable performance, with 81% of its Frankentexts considered coherent and all considered relevant to the prompt. Impressively, up to 59% of these outputs are misclassified as human-written by detectors such as Pangram, Binoculars, and FastDetectGPT. This misclassification reveals significant limitations in current AI text detection methodologies, particularly binary classifiers that struggle with mixed-origin texts. Human reviewers can occasionally identify AI-generated Frankentexts due to abrupt tonal shifts and grammatical inconsistencies, especially as narratives extend.

Implications and Applications

The emergence of Frankentexts presents multiple implications:

Authors' Attribution Challenge: The method introduces a grey zone of authorship, blurring the lines between AI-generated and human-written content, thus creating challenges for existing text detection mechanisms. This prompts the need for developing sophisticated detectors with token-level attribution capabilities to address the mixed authorship dilemma effectively.
Training Data and Research: Frankentexts offer a synthetically generated source of training data for detectors focusing on mixed authorship detection, thereby advancing research in AI detection and human-AI collaborative writing processes.
Human-AI Co-writing Studies: This paradigm serves as a sandbox environment to study the nuances of human-AI collaborative writing. By manipulating variables such as the proportion and diversity of excerpts, researchers are enabled to run controlled experiments examining stylistic blending and revision dynamics.

Future Developments in AI

The limitations highlighted by Frankentext generation provide new directions for future research in AI text generation and detection:

Improvement in Control Mechanisms: Future LLMs will need enhanced capabilities to follow complex constraints effectively, even in tasks demanding high verbatim repetition from human texts.
Advancing Detection Technologies: The development of detectors focusing on fine-grained, token-level discrimination will be crucial in handling the complexities introduced by narratives like Frankentexts.
Ethical Considerations: Researchers and policymakers must engage with ethical concerns surrounding authorship, provenance, and potential misuse in adversarial contexts.

Conclusion

Frankentexts explore the limits of controllable text generation, testing the abilities of LLMs to maintain coherence under constraints that heavily rely on human text. By releasing the code and evaluation suite, this work aims to foster advancement in mixed-origin text detection and offer insights into collaborative writing dynamics between humans and AI, ultimately paving the way for more nuanced understanding and formulation of AI-generated content.

Markdown Report Issue