Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do LLMs write like humans? Variation in grammatical and rhetorical styles (2410.16107v1)

Published 21 Oct 2024 in cs.CL

Abstract: LLMs are capable of writing grammatical text that follows instructions, answers questions, and solves problems. As they have advanced, it has become difficult to distinguish their output from human-written text. While past research has found some differences in surface features such as word choice and punctuation, and developed classifiers to detect LLM output, none has studied the rhetorical styles of LLMs. Using several variants of Llama 3 and GPT-4o, we construct two parallel corpora of human- and LLM-written texts from common prompts. Using Douglas Biber's set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans and between different LLMs. These differences persist when moving from smaller models to larger ones, and are larger for instruction-tuned models than base models. This demonstrates that despite their advanced abilities, LLMs struggle to match human styles, and hence more advanced linguistic features can detect patterns in their behavior not previously recognized.

Summary

  • The paper reveals that instruction-tuned LLMs exhibit distinct grammatical and rhetorical patterns compared to human authors.
  • It employs Douglas Biber's linguistic feature analysis on diverse text corpora to compare nominalizations, participial clauses, and modality.
  • Findings imply that tuning methods amplify non-human stylistic traits, aiding content classification and challenging human-likeness assumptions.

Evaluating Distinctions in LLM and Human Text: Grammatical and Rhetorical Divergences

This paper undertakes an analytical exploration of the stylistic differences between LLMs and human-authored texts, particularly focusing on grammatical and rhetorical features. The paper's primary contribution lies in examining how instruction-tuned LLMs, such as OpenAI's GPT-4o and variants of Meta Llama 3, diverge from human writing styles.

Methodology

The authors constructed parallel corpora from human and LLM-generated texts using a diverse set of prompts. This was achieved using Douglas Biber's linguistic features, encompassing lexical and grammatical attributes, applied to corpora including academic articles, news, fiction, and spoken transcripts. These corpora allowed the authors to conduct a thorough comparison of stylistic elements, such as nominalizations, participial clauses, and modality, across human and machine-generated text.

Results

The analysis reveals systematic differences highlighting the LLMs' tendency toward certain grammatical structures. LLMs, especially those that are instruction-tuned, exhibit more frequent use of informationally dense features, such as nominalizations and participial clauses. For instance, instruction-tuned models preferred present participles at significantly higher rates than humans. Another observation is the deviation in vocabulary choice, where models like GPT-4o frequently resorted to a grandiose lexicon and avoided simpler, more colloquial terms, presenting an almost house-style preference.

Implications of Findings

These findings have notable implications:

  1. Detection and Classification: The differences in linguistic styling offer a reliable basis for classifying text into human or LLM categories. Random forest classifiers achieved high accuracy in distinguishing human text from LLM outputs, revealing the stylistical footprints left by instruction tuning. Lasso regressions also indicated appreciable classification ability but to a slightly lesser extent.
  2. Impact of Instruction Tuning: The comparison between Llama 3 base and instruction-tuned models unveils instruction tuning as a notable factor in amplifying distinctive non-human stylistic features, suggesting a potential trade-off between functional tuning and stylistic mimicry of human language.
  3. Theoretical Insights: These differences refute the notion that larger or more advanced models necessarily generate more human-like text, an implication for linguistic theories around machine learning in language processing.
  4. Practical Considerations: The variation in linguistic features across LLMs underscores concerns in domains such as content generation, where authenticity and cultural sensitivity might be compromised, impacting education and professional writing.

Future Directions

Further research could delve into refining linguistic models that bridge the gap between LLM outputs and human stylistic complexity. Another area for investigation involves exploring the effect of diverse textual corpora in model training, potentially informing unified LLM evaluations across multiple linguistic domains. Understanding these elements can guide the development of more contextually adept AI language systems.

In synthesizing these components, the authors illuminate the nuanced landscape of LLM versus human authorship, providing a basis for further scholarly inquiry and practical improvements in AI-mediated communications.