A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions (2403.01015v1)
Abstract: Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two conditions--one with anonymous discussions and the other with non-anonymous discussions, and conduct an anonymous survey of all reviewers, to address the following questions: 1. Do reviewers discuss more in one of the conditions? Marginally more in anonymous (n = 2281, p = 0.051). 2. Does seniority have more influence on final decisions when non-anonymous? Yes, the decisions are closer to senior reviewers' scores in the non-anonymous condition than in anonymous (n = 484, p = 0.04). 3. Are reviewers more polite in one of the conditions? No significant difference in politeness of reviewers' text-based responses (n = 1125, p = 0.72). 4. Do reviewers' self-reported experiences differ across the two conditions? No significant difference for each of the five questions asked (n = 132 and p > 0.3). 5. Do reviewers prefer one condition over the other? Yes, there is a weak preference for anonymous discussions (n = 159 and Cohen's d= 0.25). 6. What do reviewers consider important to make policy on anonymity among reviewers? Reviewers' feeling of safety in expressing their opinions was rated most important, while polite communication among reviewers was rated least important (n = 159). 7. Have reviewers experienced dishonest behavior due to non-anonymity in discussions? Yes, roughly 7% of respondents answered affirmatively (n = 167). Overall, this experiment reveals evidence supporting an anonymous discussion setup in the peer-review process, in terms of the evaluation criteria considered.
- Status conflict in groups. Organization Science - ORGAN SCI, 21, 01 2010. doi: 10.1287/orsc.1110.0734.
- PolitePEER: Does peer review hurt? A dataset to gauge politeness intensity in the peer reviews. Language Resources and Evaluation, pages 1–23, 05 2023. doi: 10.1007/s10579-023-09662-3.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901, 2020.
- Vicuna: An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
- It’s the conventional thought that counts: How third-order inference produces status advantage. American Sociological Review, 82:000312241769150, 02 2017. doi: 10.1177/0003122417691503.
- Social facilitation of dominant responses by the presence of an audience and the mere presence of others. Journal of Personality and Social Psychology - PSP, 9:245–250, 07 1968. doi: 10.1037/h0025902.
- Effects of deindividuation variables on stealing among halloween trick-or-treaters. Journal of Personality and Social Psychology, 33:178–183, 1976. URL https://api.semanticscholar.org/CorpusID:144683150.
- Intergroup bias: status, differentiation, and a common in-group identity. Journal of personality and social psychology, 75 1:109–20, 1998.
- The equalization phenomenon: Status effects in computer-mediated and face-to-face decision-making groups. Human–Computer Interaction, 6(2):119–146, 1991. doi: 10.1207/s15327051hci0602_2. URL https://www.tandfonline.com/doi/abs/10.1207/s15327051hci0602_2.
- Panel discussion does not improve reliability of peer review for medical research grant proposals. Journal of clinical epidemiology, 65(1):47–52, 2012.
- Little race or gender bias in an experiment of initial review of NIH R01 grant proposals. Nature human behaviour, 3(3):257–264, 2019.
- Peering at the peer review process for conference submissions. In 2012 Frontiers in Education Conference Proceedings, pages 1–6. IEEE, 2012.
- Discussion between reviewers does not improve reliability of peer review of hospital quality. Medical Care, 38(2):152–161, 2000. ISSN 00257079. URL http://www.jstor.org/stable/3767153.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9):195:1–195:35, 2023. doi: 10.1145/3560815. URL https://doi.org/10.1145/3560815.
- Impact of double-blind reviewing on sigmod publication rates. ACM SIGMOD Record, 35(2):29–32, 2006.
- On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1):50–60, 1947.
- Uncovering latent biases in text: Method and application to peer review. Proceedings of the AAAI Conference on Artificial Intelligence, 35(6):4767–4775, May 2021. doi: 10.1609/aaai.v35i6.16608. URL https://ojs.aaai.org/index.php/AAAI/article/view/16608.
- Examining the value added by committee discussion in the review of applications for research awards. Research Evaluation, 16(2):79–91, 2007.
- Single-blind vs double-blind peer review in the setting of author prestige. Jama, 316(12):1315–1316, 2016.
- Your comments are meaner than your score: score calibration talk influences intra-and inter-panel variability during scientific grant peer review. Research Evaluation, 26(1):1–14, 2017.
- A community’s perspective on the status and future of peer review in software engineering. Information and Software Technology, 95:75–85, 2018.
- How do authors’ perceptions of their papers compare with co-authors’ perceptions and peer-review decisions? arXiv preprint arXiv:2211.12966, 2022a.
- To ArXiv or not to ArXiv: A study quantifying pros and cons of posting preprints online. arXiv preprint arXiv:2203.17259, 2022b.
- Perceptions of ethical problems with scientific journal peer review: an exploratory study. Science and engineering ethics, 14(3):305–310, 2008.
- “I’d be so much more comfortable posting anonymously”: Identified versus anonymous participation in student discussion boards. Australasian Journal of Educational Technology, 29:612–625, 11 2013. doi: 10.14742/ajet.452.
- Does single blind peer review hinder newcomers? Scientometrics, 113(1):567–585, 2017.
- Nihar B Shah. An overview of challenges, experiments, and computational solutions in peer review. Preprint http://bit.ly/PeerReviewOverview; abridged version appeared in the communications of the ACM, 2022.
- Some aspects of deindividuation: Identification and conformity. Journal of Experimental Social Psychology, 1(4):356–378, 1965. ISSN 0022-1031. doi: https://doi.org/10.1016/0022-1031(65)90015-6. URL https://www.sciencedirect.com/science/article/pii/0022103165900156.
- Richard Snodgrass. Single-versus double-blind reviewing: an analysis of the literature. ACM Sigmod Record, 35(3):8–21, 2006.
- A large scale randomized controlled trial on herding in peer-review discussions. Plos one, 18(7):e0287443, 2023.
- Social influence among experts: Field experimental evidence from peer review. https://www.aeaweb.org/conference/2020/preliminary/paper/eSiYNk3H, 2020.
- Reviewer bias in single- versus double-blind peer review. Proceedings of the National Academy of Sciences, 114(48):12708–12713, 2017. ISSN 0027-8424. doi: 10.1073/pnas.1707323114. URL https://www.pnas.org/content/114/48/12708.
- Anthony KH Tung. Impact of double blind reviewing on sigmod publication: a more detail analysis. ACM SIGMOD Record, 35(3):6–7, 2006.
- Jeroen P. H. Verharen. ChatGPT identifies gender disparities in scientific peer review. eLife, 12, July 2023. doi: 10.1101/2023.07.18.549552.
- Suzanne P Weisband. Group discussion and first advocacy effects in computer-mediated and face-to-face decision making groups. Organizational Behavior and Human Decision Processes, 53(3):352–380, 1992. ISSN 0749-5978. doi: https://doi.org/10.1016/0749-5978(92)90070-N. URL https://www.sciencedirect.com/science/article/pii/074959789290070N.
- Judging LLM-as-a-judge with MT-bench and chatbot arena. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=uccHPGDlao.
- Charvi Rastogi (18 papers)
- Xiangchen Song (22 papers)
- Zhijing Jin (68 papers)
- Ivan Stelmakh (16 papers)
- Kun Zhang (353 papers)
- Nihar B. Shah (73 papers)
- Hal Daumé III (76 papers)