Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mapping the Increasing Use of LLMs in Scientific Papers (2404.01268v1)

Published 1 Apr 2024 in cs.CL, cs.AI, cs.DL, cs.LG, and cs.SI

Abstract: Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using LLMs like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings.

Mapping the Increasing Use of LLMs in Scientific Papers: An Analytical Overview

The paper "Mapping the Increasing Use of LLMs in Scientific Papers" by Weixin Liang et al. presents a systematic, large-scale analysis aimed at quantifying the prevalence of LLM-modified content across various academic disciplines. The paper scrutinizes 950,965 papers published between January 2020 and February 2024 from arXiv, bioRxiv, and a portfolio of Nature journals, employing a statistical framework adapted for corpus-level rather than individual-level analysis. This approach is particularly suited to understanding structural patterns and shifts in academic writing attributable to LLM usage.

Methodology

The authors employ an advanced adaptation of the distributional GPT quantification framework developed by Liang et al. (2024). This methodology undertakes the following steps:

  1. Problem Formulation: The goal is to estimate the fractional contribution (α\alpha) of LLM-modified content in a mixture distribution of human and AI-generated texts.
  2. Parameterization: This framework models token distributions, focusing on the occurrence probabilities in human-written (ptp_t) and LLM-modified (qtq_t) texts.
  3. Estimation: Using a two-fold estimation process to generate these probabilities from known human and AI-modified text collections.
  4. Inference: The paper leverages a maximum likelihood estimation (MLE) approach to infer α\alpha by maximizing the log-likelihood across the given corpus.

A noteworthy feature is the two-stage approach to generating realistic LLM-produced training data, which aims to avoid creating fabricated or hallucinated academic content. The added step of summarizing and expanding original text via LLMs helps produce plausible AI-generated scientific writing.

Main Findings

Temporal Trends in LLM Usage

The analysis reveals a noticeable uptrend in LLM-modified content starting approximately five months post-release of ChatGPT. The most significant increase was observed in the domain of Computer Science, with the fraction of LLM-modified content in abstracts rising to 17.5% and in introductions to 15.3% by February 2024. Electrical Engineering and Systems Science also demonstrated substantial growth, while Mathematics and journals in the Nature portfolio exhibited relatively lower increases.

Attributes Associated with Increased LLM Usage

  1. First-Author Preprint Posting Frequency: Papers whose first authors posted more preprints on arXiv showed higher levels of LLM-modified content. By February 2024, the estimated fraction was 19.3% for abstracts and 16.9% for introductions among prolific preprint posters, compared to 15.6% and 13.7%, respectively, for less prolific authors. This correlation persists across different subcategories within Computer Science, indicating the influence of publication pressure on embracing LLM tools.
  2. Paper Similarity: There is a strong relationship between a paper's similarity to its closest peer and the extent of LLM modification. Papers that were more similar to their nearest peer (below median distance in the embedding space) had a higher fraction of LLM-modified content, peaking at 22.2% in abstracts by February 2024. This phenomenon might suggest that the use of LLMs contributes to more homogenized writing styles or is more prevalent in densely populated research fields.
  3. Paper Length: Shorter papers consistently exhibited higher LLM-modified content compared to longer ones. By February 2024, shorter papers had 17.7% of their abstract sentences modified, versus 13.6% for longer papers. This trend implies that concise papers, possibly due to brevity-oriented constraints or time pressures, rely more on LLM assistance.

Implications and Future Outlook

The paper provides granular insights into how and where LLMs are being integrated into scientific workflows. These findings have multiple implications:

  • Research Integrity: The increasing prevalence of LLM-modified content raises questions about the authenticity, originality, and potential risks, including the homogenization of scientific styles and possible dependencies on proprietary LLM tools.
  • Policy Formulation: The evidence supports the need for clear guidelines and policies regarding the ethical use of LLMs in academic writing, as exemplified by the stances taken by ICML and the journal Science.
  • Future Research Directions: Future investigations could extend this work to other LLMs and explore the causal relationship between LLM usage and associated factors such as research productivity, competitive pressures, and quality of scholarly output.

In summary, this paper provides a comprehensive quantitative foundation to understand LLM usage trends in academia, emphasizing the nuanced and varied adoption across different scientific fields. The insights derived offer a critical basis for formulating policies and ethical guidelines, ensuring the robust and equitable integration of LLMs into the scholarly ecosystem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Scott Aaronson. Simons Institute Talk on Watermarking of Large Language Models, 2023. URL https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17.
  2. Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation. In Information Hiding, 2001.
  3. Identifying Real or Fake Articles: Towards better Language Modeling. In International Joint Conference on Natural Language Processing, 2008.
  4. Real or Fake? Learning to Discriminate Machine from Human Generated Text. ArXiv, abs/1906.03351, 2019.
  5. Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature. ArXiv, abs/2310.05130, 2023.
  6. Daria Beresneva. Computer-Generated Text Detection Using Machine Learning: A Systematic Review. In International Conference on Applications of Natural Language to Data Bases, 2016.
  7. Squibs: What Is a Paraphrase? Computational Linguistics, 39:463–472, 2013.
  8. ConDA: Contrastive Domain Adaptation for AI-generated Text Detection. ArXiv, abs/2309.03992, 2023. URL https://api.semanticscholar.org/CorpusID:261660497.
  9. On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
  10. GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content. ArXiv, abs/2305.07969, 2023. URL https://api.semanticscholar.org/CorpusID:258686680.
  11. Natural Language Watermarking Using Semantic Substitution for Chinese Text. In International Workshop on Digital Watermarking, 2003.
  12. Gemma Conroy. How ChatGPT and other AI tools could disrupt scientific publishing. Nature, October 2023a. URL https://www.nature.com/articles/d41586-023-03144-w.
  13. Gemma Conroy. Scientific sleuths spot dishonest ChatGPT use in papers. Nature, September 2023b. URL https://www.nature.com/articles/d41586-023-02477-w.
  14. Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. arXiv preprint arXiv:2210.07321, 2022.
  15. Mack Deguerin. AI-generated nonsense is leaking into scientific journals . Popular Science, March 2024. URL https://www.popsci.com/technology/ai-generated-text-scientific-journals/.
  16. Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality, 2023. Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-013.
  17. What’s In My Big Data? In The Twelfth International Conference on Learning Representations, 2023.
  18. Holly Else. Abstracts written by ChatGPT fool scientists. Nature, Jan 2023. URL https://www.nature.com/articles/d41586-023-00056-7.
  19. TweepFake: About detecting deepfake tweets. Plos one, 16(5):e0251415, 2021.
  20. Three Bricks to Consolidate Watermarks for Large Language Models. 2023 IEEE International Workshop on Information Forensics and Security (WIFS), pp.  1–6, 2023.
  21. Tradition and innovation in scientists’ research strategies. American sociological review, 80(5):875–908, 2015.
  22. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv, pp.  2022–12, 2022.
  23. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  24. GLTR: Statistical Detection and Visualization of Generated Text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp.  111–116, 2019.
  25. Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey. ArXiv, abs/2310.15264, 2023.
  26. ’Person’== Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion. arXiv preprint arXiv:2310.19981, 2023.
  27. Melissa Heikkilä. How to spot AI-generated text. MIT Technology Review, Dec 2022. URL https://www.technologyreview.com/2022/12/19/1065596/how-to-spot-ai-generated-text/.
  28. RADAR: Robust AI-Text Detection via Adversarial Learning. ArXiv, abs/2307.03838, 2023a. URL https://api.semanticscholar.org/CorpusID:259501842.
  29. Unbiased Watermark for Large Language Models. ArXiv, abs/2310.10669, 2023b.
  30. ICML. Clarification on large language model policy LLM. https://icml.cc/Conferences/2023/llm-policy, 2023.
  31. Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:1911.00650, 2019.
  32. Automatic detection of machine generated text: A critical survey. arXiv preprint arXiv:2011.01314, 2020.
  33. Samantha Murphy Kelly. ChatGPT creator pulls AI detection tool due to ‘low rate of accuracy’. CNN Business, Jul 2023. URL https://www.cnn.com/2023/07/25/tech/openai-ai-detection-tool/index.html.
  34. Recalibrating the scope of scholarly publishing: A modest step in a vast decolonization process. Quantitative Science Studies, 3(4):912–930, 12 2022. ISSN 2641-3337. doi: 10.1162/qss˙a˙00228. URL https://doi.org/10.1162/qss_a_00228.
  35. A watermark for large language models. International Conference on Machine Learning, 2023.
  36. New AI classifier for indicating AI-written text, 2023. URL https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text.
  37. Robust Distortion-free Watermarks for Language Models. ArXiv, abs/2307.15593, 2023.
  38. Detecting Fake Content with Relative Entropy Scoring. Pan, 2008.
  39. Deepfake Text Detection in the Wild. ArXiv, abs/2305.13242, 2023. URL https://api.semanticscholar.org/CorpusID:258832454.
  40. GPT detectors are biased against non-native English writers. ArXiv, abs/2304.02819, 2023a.
  41. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. arXiv preprint arXiv:2310.01783, 2023b.
  42. Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews. arXiv preprint arXiv:2403.07183, 2024.
  43. Reviewergpt? an exploratory study on using large language models for paper reviewing. arXiv preprint arXiv:2306.00622, 2023.
  44. CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning. ArXiv, abs/2212.10341, 2022. URL https://api.semanticscholar.org/CorpusID:254877728.
  45. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, abs/1907.11692, 2019.
  46. MacroPolo. The Global AI Talent Tracker, 2024. URL https://macropolo.org/digital-projects/the-global-ai-talent-tracker/.
  47. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. ArXiv, abs/2301.11305, 2023a.
  48. DetectGPT: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305, 2023b.
  49. Paulina Okunytė. Google search exposes academics using ChatGPT in research papers. Cybernews, November 2023. URL https://cybernews.com/news/academic-cheating-chatgpt-openai/.
  50. OpenAI. GPT-2: 1.5B release. https://openai.com/research/gpt-2-1-5b-release, 2019. Accessed: 2019-11-05.
  51. Papers and peer reviews with evidence of ChatGPT writing . Retraction Watch, 2024. URL https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/.
  52. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  53. Can AI-Generated Text be Reliably Detected? ArXiv, abs/2303.11156, 2023.
  54. Whose opinions do language models reflect? In International Conference on Machine Learning, pp.  29971–30004. PMLR, 2023.
  55. Red Teaming Language Model Detectors with Language Models. ArXiv, abs/2305.19713, 2023.
  56. The curse of recursion: Training on generated data makes models forget. arXiv preprint arXiv:2305.17493, 2023.
  57. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203, 2019.
  58. H. Holden Thorp. Chatgpt is fun, but not an author. Science, 379(6630):313–313, 2023. doi: 10.1126/science.adg7879. URL https://www.science.org/doi/abs/10.1126/science.adg7879.
  59. Natural language watermarking: challenges in building a practical system. In Electronic imaging, 2006a.
  60. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Workshop on Multimedia & Security, 2006b.
  61. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  62. Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts. ArXiv, abs/2306.04723, 2023.
  63. Authorship Attribution for Neural Text Generation. In Conference on Empirical Methods in Natural Language Processing, 2020.
  64. Dann. Van Rossum. Generative AI Top 150: The World’s Most Used AI Tools. https://www.flexos.work/learn/generative-ai-top-150, February 2024.
  65. Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks. arXiv preprint arXiv:2306.07899, 2023.
  66. James Vincent. ‘As an AI language model’: the phrase that shows how AI is pollulating the web. The Verge, Apr 2023. URL https://www.theverge.com/2023/4/25/23697218/ai-generated-spam-fake-user-reviews-as-an-ai-language-model.
  67. Testing of detection tools for AI-generated text. International Journal for Educational Integrity, 19(1):26, 2023. ISSN 1833-2595. doi: 10.1007/s40979-023-00146-z. URL https://doi.org/10.1007/s40979-023-00146-z.
  68. Max Wolff. Attacking Neural Text Detectors. ArXiv, abs/2002.11768, 2020.
  69. DiPmark: A Stealthy, Efficient and Resilient Watermark for Large Language Models. ArXiv, abs/2310.07710, 2023.
  70. DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text. ArXiv, abs/2305.17359, 2023a.
  71. A Survey on Detection of LLMs-Generated Content. ArXiv, abs/2310.15654, 2023b.
  72. Robust Multi-bit Natural Language Watermarking through Invariant Features. In Annual Meeting of the Association for Computational Linguistics, 2023.
  73. GPT Paternity Test: GPT Generated Text Detection with GPT Genetic Inheritance. ArXiv, abs/2305.12519, 2023. URL https://api.semanticscholar.org/CorpusID:258833423.
  74. Defending Against Neural Fake News. ArXiv, abs/1905.12616, 2019.
  75. Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors. ArXiv, abs/2312.12918, 2023.
  76. Protecting Language Generation Models via Invisible Watermarking. In Proceedings of the 40th International Conference on Machine Learning, pp.  42187–42199, 2023.
  77. Provable Robust Watermarking for AI-Generated Text. In International Conference on Learning Representations (ICLR), 2024a.
  78. Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs. arXiv preprint arXiv:2402.05864, 2024b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Weixin Liang (33 papers)
  2. Yaohui Zhang (6 papers)
  3. Zhengxuan Wu (37 papers)
  4. Haley Lepp (5 papers)
  5. Wenlong Ji (12 papers)
  6. Xuandong Zhao (47 papers)
  7. Hancheng Cao (20 papers)
  8. Sheng Liu (122 papers)
  9. Siyu He (19 papers)
  10. Zhi Huang (10 papers)
  11. Diyi Yang (151 papers)
  12. Christopher Potts (113 papers)
  13. James Y. Zou (7 papers)
  14. Christopher D Manning (2 papers)
Citations (37)
Youtube Logo Streamline Icon: https://streamlinehq.com