Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions (2403.01015v1)

Published 1 Mar 2024 in cs.CY and cs.DL

Abstract: Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two conditions--one with anonymous discussions and the other with non-anonymous discussions, and conduct an anonymous survey of all reviewers, to address the following questions: 1. Do reviewers discuss more in one of the conditions? Marginally more in anonymous (n = 2281, p = 0.051). 2. Does seniority have more influence on final decisions when non-anonymous? Yes, the decisions are closer to senior reviewers' scores in the non-anonymous condition than in anonymous (n = 484, p = 0.04). 3. Are reviewers more polite in one of the conditions? No significant difference in politeness of reviewers' text-based responses (n = 1125, p = 0.72). 4. Do reviewers' self-reported experiences differ across the two conditions? No significant difference for each of the five questions asked (n = 132 and p > 0.3). 5. Do reviewers prefer one condition over the other? Yes, there is a weak preference for anonymous discussions (n = 159 and Cohen's d= 0.25). 6. What do reviewers consider important to make policy on anonymity among reviewers? Reviewers' feeling of safety in expressing their opinions was rated most important, while polite communication among reviewers was rated least important (n = 159). 7. Have reviewers experienced dishonest behavior due to non-anonymity in discussions? Yes, roughly 7% of respondents answered affirmatively (n = 167). Overall, this experiment reveals evidence supporting an anonymous discussion setup in the peer-review process, in terms of the evaluation criteria considered.

An Empirical Examination of Anonymity in Peer Review Discussions

Introduction

The peer review system, while not without its flaws, remains a cornerstone of academic integrity and quality assurance in scientific research. An often debated aspect of this system is whether reviewers should remain anonymous to each other during discussions. This paper, conducted during the Conference on Uncertainty in Artificial Intelligence (UAI) 2022, employs a randomized controlled trial to explore the implications of reviewer anonymity on discussion engagement, decision-making influence, politeness, and participant preferences.

Experiment Design

The UAI 2022 conference served as a live testing ground for the experiment, with submissions and reviewers randomly assigned to anonymous or non-anonymous discussion conditions. This setup allowed for an empirical comparison across several dimensions, including discussion participation rates, influence of reviewer seniority on final decisions, and perceptions of discussion politeness. A supplementary survey provided additional insights into reviewers' preferences and perceptions regarding anonymity.

Discussion Engagement

One of the primary findings of the paper is a marginally higher rate of discussion posts in the anonymous condition, although the difference, at a p-value of 0.051, skirts the edge of statistical significance. Interestingly, there was no significant difference in posting behavior when dissected by reviewer seniority, contrasting with the hypothesis that junior reviewers might be more vocal when their identities are concealed.

Influence of Reviewer Seniority

A key concern within peer review discussions is the potential for dominant influences, particularly from senior reviewers. The paper uncovered that final decisions in the non-anonymous condition were more likely to align with the initial scores given by senior reviewers, an observation supported by a p-value of 0.04. This finding suggests that the visibility of reviewer identities can indeed skew decision-making toward more senior participants.

Discussion Politeness

Contrary to the assumption that anonymity might lead to less polite discourse, the paper found no significant difference in the politeness levels of discussion posts across both conditions. This outcome challenges the notion that removing anonymity necessarily improves the civility of peer review discussions.

Reviewer Preferences and Experiences

Survey responses indicated a weak preference for anonymous discussions among reviewers, with no significant differences in self-reported experiences related to comfort, understanding, or perceived responsibility in discussions. Notably, 7% of respondents reported witnessing dishonest behavior in previous non-anonymous review settings, highlighting a potential risk associated with visible reviewer identities.

Implications and Future Directions

This paper provides valuable empirical evidence on the impacts of reviewer anonymity in peer review discussions. The findings on seniority influence and the lack of difference in politeness levels challenge some commonly held beliefs about the benefits of non-anonymous reviews. However, the observed preference for anonymity, albeit slight, suggests that the academic community may lean towards more private discussion environments.

Moving forward, it will be crucial for conference organizers and journal editors to consider these findings when designing or refining their review processes. Further research could explore additional factors, such as the quality of review content and long-term effects on publication quality, to build a more comprehensive understanding of optimal peer review practices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Status conflict in groups. Organization Science - ORGAN SCI, 21, 01 2010. doi: 10.1287/orsc.1110.0734.
  2. PolitePEER: Does peer review hurt? A dataset to gauge politeness intensity in the peer reviews. Language Resources and Evaluation, pages 1–23, 05 2023. doi: 10.1007/s10579-023-09662-3.
  3. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901, 2020.
  4. Vicuna: An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  5. It’s the conventional thought that counts: How third-order inference produces status advantage. American Sociological Review, 82:000312241769150, 02 2017. doi: 10.1177/0003122417691503.
  6. Social facilitation of dominant responses by the presence of an audience and the mere presence of others. Journal of Personality and Social Psychology - PSP, 9:245–250, 07 1968. doi: 10.1037/h0025902.
  7. Effects of deindividuation variables on stealing among halloween trick-or-treaters. Journal of Personality and Social Psychology, 33:178–183, 1976. URL https://api.semanticscholar.org/CorpusID:144683150.
  8. Intergroup bias: status, differentiation, and a common in-group identity. Journal of personality and social psychology, 75 1:109–20, 1998.
  9. The equalization phenomenon: Status effects in computer-mediated and face-to-face decision-making groups. Human–Computer Interaction, 6(2):119–146, 1991. doi: 10.1207/s15327051hci0602_2. URL https://www.tandfonline.com/doi/abs/10.1207/s15327051hci0602_2.
  10. Panel discussion does not improve reliability of peer review for medical research grant proposals. Journal of clinical epidemiology, 65(1):47–52, 2012.
  11. Little race or gender bias in an experiment of initial review of NIH R01 grant proposals. Nature human behaviour, 3(3):257–264, 2019.
  12. Peering at the peer review process for conference submissions. In 2012 Frontiers in Education Conference Proceedings, pages 1–6. IEEE, 2012.
  13. Discussion between reviewers does not improve reliability of peer review of hospital quality. Medical Care, 38(2):152–161, 2000. ISSN 00257079. URL http://www.jstor.org/stable/3767153.
  14. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9):195:1–195:35, 2023. doi: 10.1145/3560815. URL https://doi.org/10.1145/3560815.
  15. Impact of double-blind reviewing on sigmod publication rates. ACM SIGMOD Record, 35(2):29–32, 2006.
  16. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1):50–60, 1947.
  17. Uncovering latent biases in text: Method and application to peer review. Proceedings of the AAAI Conference on Artificial Intelligence, 35(6):4767–4775, May 2021. doi: 10.1609/aaai.v35i6.16608. URL https://ojs.aaai.org/index.php/AAAI/article/view/16608.
  18. Examining the value added by committee discussion in the review of applications for research awards. Research Evaluation, 16(2):79–91, 2007.
  19. Single-blind vs double-blind peer review in the setting of author prestige. Jama, 316(12):1315–1316, 2016.
  20. Your comments are meaner than your score: score calibration talk influences intra-and inter-panel variability during scientific grant peer review. Research Evaluation, 26(1):1–14, 2017.
  21. A community’s perspective on the status and future of peer review in software engineering. Information and Software Technology, 95:75–85, 2018.
  22. How do authors’ perceptions of their papers compare with co-authors’ perceptions and peer-review decisions? arXiv preprint arXiv:2211.12966, 2022a.
  23. To ArXiv or not to ArXiv: A study quantifying pros and cons of posting preprints online. arXiv preprint arXiv:2203.17259, 2022b.
  24. Perceptions of ethical problems with scientific journal peer review: an exploratory study. Science and engineering ethics, 14(3):305–310, 2008.
  25. “I’d be so much more comfortable posting anonymously”: Identified versus anonymous participation in student discussion boards. Australasian Journal of Educational Technology, 29:612–625, 11 2013. doi: 10.14742/ajet.452.
  26. Does single blind peer review hinder newcomers? Scientometrics, 113(1):567–585, 2017.
  27. Nihar B Shah. An overview of challenges, experiments, and computational solutions in peer review. Preprint http://bit.ly/PeerReviewOverview; abridged version appeared in the communications of the ACM, 2022.
  28. Some aspects of deindividuation: Identification and conformity. Journal of Experimental Social Psychology, 1(4):356–378, 1965. ISSN 0022-1031. doi: https://doi.org/10.1016/0022-1031(65)90015-6. URL https://www.sciencedirect.com/science/article/pii/0022103165900156.
  29. Richard Snodgrass. Single-versus double-blind reviewing: an analysis of the literature. ACM Sigmod Record, 35(3):8–21, 2006.
  30. A large scale randomized controlled trial on herding in peer-review discussions. Plos one, 18(7):e0287443, 2023.
  31. Social influence among experts: Field experimental evidence from peer review. https://www.aeaweb.org/conference/2020/preliminary/paper/eSiYNk3H, 2020.
  32. Reviewer bias in single- versus double-blind peer review. Proceedings of the National Academy of Sciences, 114(48):12708–12713, 2017. ISSN 0027-8424. doi: 10.1073/pnas.1707323114. URL https://www.pnas.org/content/114/48/12708.
  33. Anthony KH Tung. Impact of double blind reviewing on sigmod publication: a more detail analysis. ACM SIGMOD Record, 35(3):6–7, 2006.
  34. Jeroen P. H. Verharen. ChatGPT identifies gender disparities in scientific peer review. eLife, 12, July 2023. doi: 10.1101/2023.07.18.549552.
  35. Suzanne P Weisband. Group discussion and first advocacy effects in computer-mediated and face-to-face decision making groups. Organizational Behavior and Human Decision Processes, 53(3):352–380, 1992. ISSN 0749-5978. doi: https://doi.org/10.1016/0749-5978(92)90070-N. URL https://www.sciencedirect.com/science/article/pii/074959789290070N.
  36. Judging LLM-as-a-judge with MT-bench and chatbot arena. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=uccHPGDlao.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Charvi Rastogi (18 papers)
  2. Xiangchen Song (22 papers)
  3. Zhijing Jin (68 papers)
  4. Ivan Stelmakh (16 papers)
  5. Kun Zhang (353 papers)
  6. Nihar B. Shah (73 papers)
  7. Hal Daumé III (76 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com