Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates (2405.02150v1)

Published 3 May 2024 in cs.CY
The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates

Abstract: Journals and conferences worry that peer reviews assisted by AI, in particular, LLMs, may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8\%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4\%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4\%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends

AI in Peer Review: Impacts and Implications

Overview of the Study

The paper investigates the prevalence and impact of AI-assisted peer reviews within the peer-review system of the International Conference on Learning Representations (ICLR), a key venue for machine learning research. By employing a three-stage analysis, the researchers present findings on:

  • The extent to which AI tools, specifically LLMs, are utilized in writing peer reviews.
  • The influence of AI-assisted reviews on the scoring of submissions.
  • The effects of these reviews on the overall acceptance rates of submissions, particularly those on the borderline of acceptance.

Prevalence of AI-Assisted Reviews

The paper computes that in 2024, a significant 15.8% of reviews at ICLR were written with the aid of an LLM. Nearly half of the submissions reviewed during the conference had at least one review penned with this technological assistance. This substantial usage highlights the integration of AI tools in academic peer review, raising both potential benefits and challenges.

Impact on Submission Scores

When diving into the effects of AI on review scores, the findings are quite telling:

  • Higher Scoring: Reviews assisted by AI were generally higher than those by humans alone. The analysis found a systematic and consistent trend where AI-assisted reviews scored papers more favorably.

Acceptance Rates Influenced by AI Reviews

The results around acceptance rates are particularly intriguing. The investigation found that:

  • Increased Acceptance Odds: Papers reviewed with AI assistance saw a 3.1 percentage point increase in acceptance rates on average, and up to 4.9 percentage points for submissions around the acceptance threshold.
  • Significant for Borderline Submissions: The effect was most pronounced for submissions that were borderline cases, indicating that AI assistance could be tipping the scales in favor of acceptance for papers that might otherwise not make the cut.

Theoretical and Practical Implications

Trust and Fairness in Peer Review

The infusion of AI in peer review raises urgent questions about trust and fairness. There persists a concern that relying on AI could undermine the integrity of reviews if the technology's potential biases or limitations are not adequately understood or controlled.

Future of AI in Academic Settings

Looking forward, how AI is harnessed within peer review must be carefully managed. One positive use case could be aiding reviewers with language or grammar improvements, potentially leveling the playing field for non-native English speakers. However, ensuring transparency about AI’s role and controlling its influence in critical evaluative judgments is essential.

Policy and Guidelines

Given these results, academic conferences and journals might need to establish clearer guidelines and policies on AI use in peer reviews to retain credibility and ensure procedural fairness. This may include stipulations about declaring AI assistance in reviews or restrictions on the scope of AI's role.

Speculations on Future Developments

As AI tools continuously improve, their allure will grow, possibly leading to more widespread adoption in academic peer reviews and beyond. The further development of LLMs could make them indispensable tools for reducing reviewer fatigue but might also bring new challenges in maintaining the quality and independence of reviews. Therefore, continuous assessment and adaptation of policies governing AI use in academic settings will be crucial.

In conclusion, the paper provides a foundational understanding of AI’s present role in peer reviews and sets the stage for crucial discussions on shaping its future impact responsibly. This understanding is essential for balancing the benefits of AI in reducing reviewer load and enhancing review quality, against the needs for fairness, transparency, and maintaining human oversight in the peer-review process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Reviewing peer review, 2008.
  2. Thomas D Albright. A scientist’s take on scientific evidence in the courtroom. Proceedings of the National Academy of Sciences, 120(41):e2301839120, 2023.
  3. The manifold costs of being a non-native english speaker in science. PLoS Biology, 21(7):e3002184, 2023.
  4. Martijn Arns. Open access is tiring out peer reviewers. Nature, 515(7528):467–467, 2014.
  5. Benchmarking foundation models with language-model-as-an-examiner. NeurIPS, 36, 2024.
  6. Working 9 to 5, not the way to make an academic living: observational analysis of manuscript and peer review submissions over time. bmj, 367, 2019.
  7. The ipcc and the new map of science and politics. Wiley Interdisciplinary Reviews: Climate Change, 9(6):e547, 2018.
  8. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the association for information science and technology, 66(11):2215–2222, 2015.
  9. Acl 2023 policy on ai writing assistance. https://2023.aclweb.org/blog/ACL-2023-policy, 2023.
  10. Reviewer fatigue? why scholars decline to review their peers’ work. PS: Political Science & Politics, 48(4):595–600, 2015.
  11. Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science, 2(1):77–101, 2019.
  12. Peerless science: Peer review and US science policy. State University of New York Press, 1990.
  13. Clarivate. Global State of peer review report. https://clarivate.com/lp/global-state-of-peer-review-report/, 2018.
  14. Tjibbe Donker. The dangers of using large language models for peer review. The Lancet Infectious Diseases, 23(7):781, 2023.
  15. Justin Esarey. Does peer review identify the best papers? a simulation study of editors, reviewers, and the scientific publication process. PS: Political Science & Politics, 50(4):963–969, 2017.
  16. Recruitment of reviewers is becoming harder at some journals: a test of the influence of reviewer fatigue at six journals in ecology and evolution. Research Integrity and Peer Review, 2:1–6, 2017.
  17. Scientists are working overtime: when do scientists download scientific papers? Scientometrics, 127(11):6413–6429, 2022.
  18. The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768, 2023.
  19. Fighting reviewer fatigue or amplifying bias? considerations and recommendations for use of chatgpt and other large language models in scholarly peer review. Research integrity and peer review, 8(1):4, 2023.
  20. ICML. ICML 2023. https://icml.cc/Conferences/2023/llm-policy, 2024.
  21. Human heuristics for ai-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11):e2208839120, 2023.
  22. Leakage and the reproducibility crisis in machine-learning-based science. Patterns, 4(9), 2023.
  23. Monitoring ai-modified content at scale: A case study on the impact of chatgpt on ai conference peer reviews. arXiv preprint arXiv:2403.07183, 2024.
  24. Mapping the increasing use of llms in scientific papers. arXiv preprint arXiv:2404.01268, 2024.
  25. Analyzing ordinal data with metric models: What could possibly go wrong? Journal of Experimental Social Psychology, 79:328–348, 2018.
  26. Zhicheng Lin. Techniques for supercharging academic writing with generative ai. Nature Biomedical Engineering, pages 1–6, 2024.
  27. Taryn MacKinney. Scholarly Peer Review is an Age-Old Practice, But Publishing is Changing. http://www.aps.org/publications/apsnews/202310/peer-review.cfm, October 2023.
  28. Peter McCullagh. Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2):109–127, 1980.
  29. Misclassification in binary choice models. Journal of Econometrics, 200(2):295–311, 2017.
  30. Nature. Artificial Intelligence (AI), 2024.
  31. Neurips. The NeurIPS 2021 Consistency Experiment. https://blog.neurips.cc/2021/12/08/the-neurips-2021-consistency-experiment/, 2021.
  32. NIH. Peer Review. https://grants.nih.gov/grants/peer-review.htm, 2024.
  33. NSF. Merit Review | NSF - National Science Foundation. https://www.nsf.gov/bfa/dias/policy/merit_review/, 2024.
  34. Peer review congress. https://peerreviewcongress.org/. Accessed on 2024-04-18.
  35. OpenAI. Introducing ChatGPT, November 2024.
  36. Llm evaluators recognize and favor their own generations, 2024.
  37. Sentence-bert: Sentence embeddings using siamese bert-networks. EMNLP, 2019.
  38. Drummond Rennie. Let’s make peer review scientific. Nature, 535(7610):31–33, 2016.
  39. Paul R Rosenbaum. Sensitivity analysis in observational studies. Encyclopedia of statistics in behavioral science, 2005.
  40. Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
  41. On the value of preprints: An early career researcher perspective. PLoS biology, 17(2):e3000151, 2019.
  42. Science. Peer Review at Science Journals. https://www.science.org/content/page/peer-review-science-publications, 2024.
  43. Richard Smith. Peer review: a flawed process at the heart of science and journals. Journal of the royal society of medicine, 99(4):178–182, 2006.
  44. Open scholarship and peer review: a time for experimentation. ICML, 2013.
  45. Jonathan P Tennant. The state of the art in peer review. FEMS Microbiology letters, 365(19):fny204, 2018.
  46. Gptzero: Towards detection of ai-generated text using zero-shot and supervised methods, 2023.
  47. Inga Vesper. Peer reviewers unmasked: largest global survey reveals trends. Nature, 10, 2018.
  48. The stm report: An overview of scientific and scholarly journal publishing, 2015.
  49. Simon Wessely. Peer review of grant applications: what do we know? The lancet, 352(9124):301–305, 1998.
  50. Torsten Wilholt. Epistemic trust in science. The British Journal for the Philosophy of Science, 2013.
  51. Bertscore: Evaluating text generation with bert. ICLR, 2020.
  52. Judging llm-as-a-judge with mt-bench and chatbot arena. NeurIPS, 36, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Giuseppe Russo Latona (1 paper)
  2. Manoel Horta Ribeiro (44 papers)
  3. Tim R. Davidson (7 papers)
  4. Veniamin Veselovsky (17 papers)
  5. Robert West (154 papers)
Citations (10)
Youtube Logo Streamline Icon: https://streamlinehq.com