The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates
Abstract: Journals and conferences worry that peer reviews assisted by AI, in particular, LLMs, may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8\%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4\%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4\%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends
- Reviewing peer review, 2008.
- Thomas D Albright. A scientist’s take on scientific evidence in the courtroom. Proceedings of the National Academy of Sciences, 120(41):e2301839120, 2023.
- The manifold costs of being a non-native english speaker in science. PLoS Biology, 21(7):e3002184, 2023.
- Martijn Arns. Open access is tiring out peer reviewers. Nature, 515(7528):467–467, 2014.
- Benchmarking foundation models with language-model-as-an-examiner. NeurIPS, 36, 2024.
- Working 9 to 5, not the way to make an academic living: observational analysis of manuscript and peer review submissions over time. bmj, 367, 2019.
- The ipcc and the new map of science and politics. Wiley Interdisciplinary Reviews: Climate Change, 9(6):e547, 2018.
- Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the association for information science and technology, 66(11):2215–2222, 2015.
- Acl 2023 policy on ai writing assistance. https://2023.aclweb.org/blog/ACL-2023-policy, 2023.
- Reviewer fatigue? why scholars decline to review their peers’ work. PS: Political Science & Politics, 48(4):595–600, 2015.
- Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science, 2(1):77–101, 2019.
- Peerless science: Peer review and US science policy. State University of New York Press, 1990.
- Clarivate. Global State of peer review report. https://clarivate.com/lp/global-state-of-peer-review-report/, 2018.
- Tjibbe Donker. The dangers of using large language models for peer review. The Lancet Infectious Diseases, 23(7):781, 2023.
- Justin Esarey. Does peer review identify the best papers? a simulation study of editors, reviewers, and the scientific publication process. PS: Political Science & Politics, 50(4):963–969, 2017.
- Recruitment of reviewers is becoming harder at some journals: a test of the influence of reviewer fatigue at six journals in ecology and evolution. Research Integrity and Peer Review, 2:1–6, 2017.
- Scientists are working overtime: when do scientists download scientific papers? Scientometrics, 127(11):6413–6429, 2022.
- The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768, 2023.
- Fighting reviewer fatigue or amplifying bias? considerations and recommendations for use of chatgpt and other large language models in scholarly peer review. Research integrity and peer review, 8(1):4, 2023.
- ICML. ICML 2023. https://icml.cc/Conferences/2023/llm-policy, 2024.
- Human heuristics for ai-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11):e2208839120, 2023.
- Leakage and the reproducibility crisis in machine-learning-based science. Patterns, 4(9), 2023.
- Monitoring ai-modified content at scale: A case study on the impact of chatgpt on ai conference peer reviews. arXiv preprint arXiv:2403.07183, 2024.
- Mapping the increasing use of llms in scientific papers. arXiv preprint arXiv:2404.01268, 2024.
- Analyzing ordinal data with metric models: What could possibly go wrong? Journal of Experimental Social Psychology, 79:328–348, 2018.
- Zhicheng Lin. Techniques for supercharging academic writing with generative ai. Nature Biomedical Engineering, pages 1–6, 2024.
- Taryn MacKinney. Scholarly Peer Review is an Age-Old Practice, But Publishing is Changing. http://www.aps.org/publications/apsnews/202310/peer-review.cfm, October 2023.
- Peter McCullagh. Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2):109–127, 1980.
- Misclassification in binary choice models. Journal of Econometrics, 200(2):295–311, 2017.
- Nature. Artificial Intelligence (AI), 2024.
- Neurips. The NeurIPS 2021 Consistency Experiment. https://blog.neurips.cc/2021/12/08/the-neurips-2021-consistency-experiment/, 2021.
- NIH. Peer Review. https://grants.nih.gov/grants/peer-review.htm, 2024.
- NSF. Merit Review | NSF - National Science Foundation. https://www.nsf.gov/bfa/dias/policy/merit_review/, 2024.
- Peer review congress. https://peerreviewcongress.org/. Accessed on 2024-04-18.
- OpenAI. Introducing ChatGPT, November 2024.
- Llm evaluators recognize and favor their own generations, 2024.
- Sentence-bert: Sentence embeddings using siamese bert-networks. EMNLP, 2019.
- Drummond Rennie. Let’s make peer review scientific. Nature, 535(7610):31–33, 2016.
- Paul R Rosenbaum. Sensitivity analysis in observational studies. Encyclopedia of statistics in behavioral science, 2005.
- Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
- On the value of preprints: An early career researcher perspective. PLoS biology, 17(2):e3000151, 2019.
- Science. Peer Review at Science Journals. https://www.science.org/content/page/peer-review-science-publications, 2024.
- Richard Smith. Peer review: a flawed process at the heart of science and journals. Journal of the royal society of medicine, 99(4):178–182, 2006.
- Open scholarship and peer review: a time for experimentation. ICML, 2013.
- Jonathan P Tennant. The state of the art in peer review. FEMS Microbiology letters, 365(19):fny204, 2018.
- Gptzero: Towards detection of ai-generated text using zero-shot and supervised methods, 2023.
- Inga Vesper. Peer reviewers unmasked: largest global survey reveals trends. Nature, 10, 2018.
- The stm report: An overview of scientific and scholarly journal publishing, 2015.
- Simon Wessely. Peer review of grant applications: what do we know? The lancet, 352(9124):301–305, 1998.
- Torsten Wilholt. Epistemic trust in science. The British Journal for the Philosophy of Science, 2013.
- Bertscore: Evaluating text generation with bert. ICLR, 2020.
- Judging llm-as-a-judge with mt-bench and chatbot arena. NeurIPS, 36, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.