Is ChatGPT Transforming Academics' Writing Style? (2404.08627v2)

Published 12 Apr 2024 in cs.CL, cs.AI, cs.DL, and cs.LG

Abstract: Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT's writing style in their abstracts through a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. The words used for estimation are not fixed but adaptive, including those with decreasing frequency. We find that LLMs, represented by ChatGPT, are having an increasing impact on arXiv abstracts, especially in the field of computer science, where the fraction of LLM-style abstracts is estimated to be approximately 35%, if we take the responses of GPT-3.5 to one simple prompt, "revise the following sentences", as a baseline. We conclude with an analysis of both positive and negative aspects of the penetration of LLMs into academics' writing style.

PDF HTML Abstract

Assessing ChatGPT's Influence on Academic Writing Through arXiv Abstracts

Introduction

The infiltration of ChatGPT in academic writing has become a focal point of investigation due to its burgeoning usage across various fields. Mingmeng Geng and Roberto Trotta explore this domain by analyzing the textual transformation in one million arXiv abstracts from May 2018 to January 2024. Their paper leverages statistical analysis to discern word frequency changes, attributing these shifts to the growing integration of ChatGPT in the drafting and revision of academic abstracts. Primarily, the work sheds light on ChatGPT's prominence in the computer science domain, estimating a 35% revision rate in abstracts which could be correlated with ChatGPT usage based on the simplest interaction prompts with the AI.

Methodology

The novel approach of the paper differentiates between direct and indirect ChatGPT influence on academic writing. Direct application involves utilizing ChatGPT for generating or editing abstracts, while indirect influence captures the adaptation of ChatGPT's writing style by the authors themselves. The dataset comprises one million arXiv articles analyzed to capture the temporal evolution of word frequencies, further supported by a comparative analysis using the Google Ngram dataset. The paper operationalizes the notion of 'ChatGPT style' text through a robust statistical framework that accounts for both the direct application of ChatGPT in abstract creation and the subtler stylistic shifts in academic writing influenced by frequent interaction with the AI model.

Observations and Analysis

Initial findings pinpoint a significant alteration in the frequency of non-specialized words post-2023, a trend inconsistent with the conventional dynamics of academic writing but indicative of ChatGPT's stylistic influence. Notably, the decreased usage of basic words like "are" and "is" post-2023 further corroborates the AI's impact. These linguistic shifts serve as a statistical signature of ChatGPT's growing footprint in academic writing, particularly within the computer science domain.

Statistical Modeling of ChatGPT Impact

The paper institutes a quantitative framework for modeling ChatGPT's impact, emphasizing the relative application rate of ChatGPT across different academic disciplines. Through a meticulous statistical analysis, the researchers map out a substantive increase in ChatGPT-influenced abstracts. This modeling extends to discerning the potential overrepresentation of ChatGPT's stylistic tendencies compared to traditional academic writing norms, thereby providing a nuanced comprehension of ChatGPT's permeation into academic discourse.

Practical Implications and Future Directions

The implications of ChatGPT's integration into academic writing are multifaceted, encompassing both enhancements in writing efficiency and potential shifts in stylistic norms. The paper's findings prompt a broader discussion on the balance between automated assistance and original scholarly expression, raising pertinent questions about the future trajectory of academic writing in the AI era. Furthermore, the research paves the way for future investigations into the differential impacts of ChatGPT across various scientific fields, potentially guiding the development of tailored AI tools that respect disciplinary idiosyncrasies while augmenting writing proficiency.

Conclusion

Geng and Trotta's paper provides a vital empirical basis for understanding ChatGPT's influence on academic writing, particularly highlighting its significant adoption within computer science. The analytical methodologies employed offer a replicable template for assessing AI's role in academic discourse evolution, establishing a groundwork for ongoing scrutiny as AI tools become increasingly woven into the fabric of academic writing practices.