Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

38 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

5.0k 4 8

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews (2403.07183v2)

Published 11 Mar 2024 in cs.CL, cs.AI, cs.LG, and cs.SI

Abstract: We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a LLM. Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

PDF HTML Abstract

Monitoring AI-Modified Content at Scale in the Peer Review Process

Motivation and Approach

Peer reviews are fundamental to the scientific publication process, ensuring the relevance, rigor, and originality of scientific work. The advent of generative AI, like ChatGPT, has introduced potential changes in how reviews are composed, possibly impacting their quality and authenticity. This paper introduces a novel framework, leveraging a maximum likelihood model, to estimate the proportion of corpus content likely modified by AI at a large scale. Focusing on peer reviews from major AI conferences post-ChatGPT's release, this research uncovers patterns in AI-generated text use and discusses the broader implications for the peer review ecosystem.

Statistical Estimation Framework

At the core of this paper is a maximum likelihood estimation (MLE) approach designed to efficiently discern the extent of AI modification in large text corpora. By comparing known human-written and AI-generated documents, the framework estimates the distribution of texts in a given corpus that resemble either category. A critical aspect of this methodology is its ability to operate without the need for direct analysis of individual documents, making it vastly more computationally efficient and less prone to the biases of existing AI detection tools.

Case Study and Main Findings

The application of this framework to peer reviews from ICLR, NeurIPS, CoRL, and EMNLP conferences reveals significant insights:

An estimated 6.5% to 16.9% of review sentences in these conferences were substantially modified by AI.
Higher AI modification rates were observed in reviews submitted closer to deadlines, reviews without scholarly citations, and in reviews from authors who engaged less in the post-review discussion phase.
A notable correlation between the presence of AI-modified content and reduced linguistic and epistemic diversity in reviews, raising concerns about the homogenization of scholarly feedback.

These findings highlight a nuanced picture of AI use in scientific peer review, pointing to both its potential advantages in aiding reviewers and the risks it poses to the integrity and diversity of scholarly discourse.

Theoretical Implications

This paper's theoretical contributions include a robust MLE framework capable of analyzing AI-generated content across large datasets and a detailed case paper of its application within the domain of scientific peer review. The methodology provides a generalizable tool for future research into AI's impact across different information ecosystems.

Practical Implications

From a practical standpoint, this research raises important questions about the role of AI in the peer review process. The detected trends in AI use and the associated impact on review content quality and diversity underscore the need for greater transparency and guidelines around AI-assisted writing in scholarly publications. Furthermore, the findings call for interdisciplinary efforts to understand and navigate the evolving landscape of AI-generated content in scientific discourse.

Future Directions

Looking ahead, the paper advocates for continued investigation into the broad implications of LLM use in scientific communication. As AI tools become increasingly sophisticated, understanding their effects on scholarly practices, from peer review to research dissemination, will be critical. Collaborative efforts combining computational, ethical, and sociological perspectives are essential to ensure AI's responsible integration into the scientific community.

Conclusion

The exploration of AI-modified content in AI conference peer reviews post-ChatGPT reveals a complex interplay between technology and scientific communication. By providing a scalable and efficient method for estimating AI influence, this paper contributes valuable tools and insights for navigating the future of AI in academia, urging careful consideration of its benefits and challenges.

PDF Markdown Bookmark Chat (Pro)

References (82)

Authors (12)

Weixin Liang (33 papers)
Zachary Izzo (11 papers)
Yaohui Zhang (6 papers)
Haley Lepp (5 papers)
Hancheng Cao (20 papers)
Xuandong Zhao (47 papers)
Lingjiao Chen (27 papers)
Haotian Ye (39 papers)
Sheng Liu (122 papers)
Zhi Huang (10 papers)
Daniel A. McFarland (7 papers)
James Y. Zou (7 papers)

Citations (50)

View on Semantic Scholar

Tweets

https://twitter.com/MishaTeplitskiy/status/1769433162122232127

https://twitter.com/skalskip92/status/1773027050720301176

https://twitter.com/TheXeophon/status/1775162208181244315

https://twitter.com/strnr/status/1771114736005886051

https://twitter.com/natzir9/status/1769715137483542676

https://twitter.com/liang_weixin/status/1774299706988065249

YouTube

Show All Videos

HackerNews

A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews (2 points, 0 comments)
Monitoring AI-Modified Content: Impact of ChatGPT on AI Conference Peer Reviews (1 point, 0 comments)
A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews (1 point, 0 comments)