A Large Scale Randomized Controlled Trial on Herding in Peer-Review Discussions (2011.15083v1)

Published 30 Nov 2020 in cs.HC, cs.LG, and stat.AP

Abstract: Peer review is the backbone of academia and humans constitute a cornerstone of this process, being responsible for reviewing papers and making the final acceptance/rejection decisions. Given that human decision making is known to be susceptible to various cognitive biases, it is important to understand which (if any) biases are present in the peer-review process and design the pipeline such that the impact of these biases is minimized. In this work, we focus on the dynamics of between-reviewers discussions and investigate the presence of herding behaviour therein. In that, we aim to understand whether reviewers and more senior decision makers get disproportionately influenced by the first argument presented in the discussion when (in case of reviewers) they form an independent opinion about the paper before discussing it with others. Specifically, in conjunction with the review process of ICML 2020 -- a large, top tier machine learning conference -- we design and execute a randomized controlled trial with the goal of testing for the conditional causal effect of the discussion initiator's opinion on the outcome of a paper.

PDF Abstract

Overview of "A Large Scale Randomized Controlled Trial on Herding in Peer-Review Discussions"

The paper presents a rigorous investigation into the presence of herding behavior within the peer-review discussion process, utilizing data from the review process of ICML 2020. This paper is grounded on the hypothesis that social influences may cause reviewers to align their evaluations with the first argument presented during discussions. Utilizing a randomized controlled trial, the authors aim to ascertain if the initial opinion of the discussion initiator causally affects the final acceptance decision of a paper.

Methodological Design

The authors employ a classical A/B testing approach to examine the effect of discussion initiation on peer review outcomes. They carefully construct experimental conditions to limit confounding factors, such as differences in reviewer participation rates between conditions. Through randomized assignments, papers are divided into two groups: one where discussions are initiated by reviewers with the most positive scores (Group A), and another initiated by those with the most negative scores (Group B). The constraints are structured to ensure that each reviewer is engaged in initiating or contributing to the discussion of only a limited number of papers, thus mitigating potential overload and bias effects.

Experimental Implementation at ICML 2020

This paper is conducted within the peer-review framework of ICML 2020, involving 1,544 papers identified as having borderline acceptance decisions based on preliminary reviewers' scores. The experiment leverages the peer review stages, including initial reviews and author rebuttals, culminating in discussions aimed at reaching consensus or presenting dissenting views to area chairs. The experimental procedure includes strategic timing of discussion initiation requests, followed by balancing efforts to involve reviewers holding opposing initial opinions.

Results and Analysis

The findings reveal no statistically significant difference in acceptance rates between the two experimental groups, thus offering no substantial evidence of herding behavior in the peer-review discussions. Despite creating conditions that encouraged different discussion initiation orders, the anticipated effects on final decisions were not observed. Reviewers, post-discussion, tended to converge on the initial mean scores given, irrespective of initiating reviewers' opinions, which suggests a natural tendency towards consensus.

Implications and Future Directions

The lack of evidence for herding in peer-review discussions has several crucial implications. First, it challenges the assumption that peer-review discussions are heavily influenced by the initial stance of initiators, suggesting that reviewers operate with a degree of independence and rationality, possibly due to the analytical nature of the task. Second, it underscores the robustness of discussion processes in reaching a consensus that's reflective of independent expert evaluations rather than social conformity.

Moving forward, further exploration into peer-review dynamics could examine other dimensions of influence, such as the impact of discussion content or the weight of reviews' textual arguments on decision outcomes. Additionally, considering other cognitive biases and their potential manifestations in peer-reviewing could expand the understanding of decision-making processes in academic evaluations.

The results contribute critically to the discourse on peer-review processes, promoting insights into the fairness and efficacy of current practices in academic decision-making systems. As peer-review remains integral to scientific validation, continued research should aim to refine these processes, ensuring they are both equitable and robust against potential biases.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Hal Daumé III (76 papers)
Ivan Stelmakh (16 papers)
Charvi Rastogi (18 papers)
Nihar B. Shah (73 papers)
Aarti Singh (98 papers)

Citations (24)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/cremieuxrecueil/status/1793365149371256970