Papers
Topics
Authors
Recent
Search
2000 character limit reached

Pangram Humanizers: Blending AI & Human Text

Updated 29 December 2025
  • Pangram Humanizers are techniques that integrate human-sourced text into AI narratives by enforcing a high token copy ratio (typically around 90%) to mask AI origins.
  • They employ a structured two-stage pipeline—first drafting with selected human snippets followed by iterative self-revisions—to maintain narrative coherence and prompt fidelity.
  • Evaluations reveal that while detectors like Pangram misclassify over half of these hybrid texts as human, human evaluators still detect stylistic and logical inconsistencies.

Pangram Humanizers are methods and processes designed to modify the output of LLMs such that automated detectors—most notably Pangram, a closed-source Transformer classifier for AI authorship—are unable to reliably distinguish these outputs from genuine human-written text. By enforcing constraints on token origin and composition, Pangram Humanizers exploit the current vulnerabilities of detector architectures that rely predominantly on token-level statistics and uniform style features, creating long-form narratives with a high proportion of verbatim human-derived material. The Frankentext framework serves as a canonical example of this approach, demonstrating that state-of-the-art detectors are evaded in a substantial fraction of cases by carefully blending human-written and LLM-generated segments (Pham et al., 23 May 2025).

1. Formal Model of Frankentext Generation

Let SS denote a population of human-written snippets, operationalized as 1,500 paragraph-length passages sampled from large textual corpora (e.g., Books3). Given a user writing prompt PP and a target copy ratio T[0,1]T \in [0,1], Frankentext generation constructs a narrative FF such that at least a fraction TT of its tokens are exact subsequences from SS. The copy ratio metric RR is defined as

R=#copied_tokens(F,S)#total_tokens(F)R = \frac{\#\text{copied\_tokens}(F,S)}{\#\text{total\_tokens}(F)}

where #copied_tokens(F,S)\#\text{copied\_tokens}(F,S) is the number of tokens in FF matching spans in SS, and #total_tokens(F)\#\text{total\_tokens}(F) is the total length of FF. The constraint RTR \geq T (often T=0.90T=0.90) ensures that the vast majority of the narrative consists of authentic human text. The residual content—connective and transitional segments—are contributed by the LLM. This formulation permits LLMs to “humanize” their outputs by maximizing verbatim inheritance from SS while maintaining prompt adherence and narrative coherence (Pham et al., 23 May 2025).

2. Two-Stage Pipeline: Select & Stitch, Then Self-Revise

Frankentext generation proceeds via a structured two-stage process:

  • Draft Creation: The LLM is provided with the full set SS and prompt PP, and instructed to construct a narrative of NN tokens, ensuring that at least TNT \cdot N tokens are exactly reproduced from SS. The draft’s copy ratio is optionally validated using metrics such as ROUGE-L recall or by querying an AI detector (e.g., Pangram). If R<TR<T or the detector returns an “AI” label, the process is reiterated with prompts to increase copying.
  • Iterative Polishing: The LLM carries out up to three revision rounds, where it is instructed to minimally edit the draft to repair coherence or continuity errors, with explicit preservation of the copy constraint (RTR \geq T) and prompt fidelity. The process terminates early if no edits are required according to the model’s self-assessment.

This pipeline enables production of long-form narratives that maintain a “global” coherence despite extensive reliance on human-sourced spans. Each revision constrains token substitution, prioritizing minimal intervention to correct overt inconsistencies while protecting the overall human token quota (Pham et al., 23 May 2025).

3. Detection and Evaluation: Pangram and Other Classifiers

Automated detectors—including Pangram, Binoculars (a cross-model perplexity heuristic), and FastDetectGPT (probability-curvature)—are deployed to assess the provenance of candidate narratives. Pangram, in particular, utilizes a sliding-window Transformer classifier to assign labels from “Highly Likely AI” to “Human”, with a “Mixed” category for token-level authorship blending. The operational distinction groups the first three categories as evidence of AI involvement.

Empirical evaluation on Gemini-2.5-Pro Frankentexts at T=0.90T = 0.90 demonstrates:

Metric Value
Copy rate 75% of tokens matched to human snippets
Coherence 81% of stories judged coherent (GPT-4.1)
Prompt relevance 100%
Pangram misclassification 59% labeled as human
Binoculars/FastDetectGPT <2% detected as AI

These results indicate that Frankentexts circumvent Pangram’s classification pipeline in more than half of instances, with Binoculars and FastDetectGPT detecting AI origins at negligible rates (Pham et al., 23 May 2025).

4. Qualitative Failure Modes and Human Judgment

Although automatic detectors are frequently deceived by Pangram Humanizers, human raters identify telltale artifacts at a higher rate. Frankentexts often manifest:

  • Abrupt shifts in tone or style, particularly at boundaries where distinct snippets are spliced (e.g., a transition from literary flourish to clinical exposition).
  • Grammatical inconsistencies, including pronoun mismatches and punctuation anomalies across segment junctions.
  • Discontinuities in narrative logic or plot, especially when thematically unrelated passages are connected without sufficient semantic mediation.

In studies, human annotators detect these surface cues and outperform Pangram (56% vs 33% detection rate in a 30-story subset), highlighting limitations in detectors that prioritize token-level statistical regularities over broader narrative structure (Pham et al., 23 May 2025).

5. Detector Vulnerabilities and Mixed Authorship Attribution

Pangram’s performance degradation in the presence of Frankentexts is attributed to its dependence on “AI keywords” and statistically uniform stylistic markers. When the majority of narrative spans derive from human sources, these signals are largely absent, impairing Pangram’s discriminative capacity. Detectors also falter when extensive contiguous sequences are inherited from genuine human text—even if LLMs furnish connective matter. This suggests a critical blind spot in contemporary classifier architectures tasked with binary AI/human determination in hybrid-authored content (Pham et al., 23 May 2025).

To address these deficits, several measures are proposed:

  • Token-level attribution: Training models with mixed-authorship labels to label tokens or spans as human- or AI-generated (leveraging Frankentexts, where segment origins are tracked, as training data).
  • Style-boundary detection: Identifying statistically improbable alternations in register or syntax that indicate transition points between stitched segments.
  • Global semantic-coherence checks: Monitoring for abrupt deviations in plot, character behavior, or factual continuity, which can signal hybrid construction.

6. Advanced Humanization Strategies and Directions for Robust Detection

Enhancement of humanizer techniques entails:

  • Refined transition modeling: Fine-tuning the generative model to produce smoother stylistic bridges between snippets, attenuating abruptness at segment boundaries.
  • Adaptive retrieval: Selecting snippets that more precisely match in genre, tone, or register, reducing stylistic discontinuities.
  • Self-critique revision: Incorporating explicit post-stitching prompts to the LLM, prioritizing narrative and stylistic uniformity.

From a detection standpoint, the growing sophistication of Pangram Humanizers underscores the need for detectors capable of nuanced, context-aware mixed-authorship inference, extending beyond simple binary prediction. Frankentexts constitute a valuable resource both for training attribution models and as a benchmark for evaluating detector robustness in the “grey zone” where human and AI contributions are deliberately intertwined (Pham et al., 23 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Pangram Humanizers.