PeerArg: Argumentative Peer Review with LLMs (2409.16813v1)

Published 25 Sep 2024 in cs.AI

Abstract: Peer review is an essential process to determine the quality of papers submitted to scientific conferences or journals. However, it is subjective and prone to biases. Several studies have been conducted to apply techniques from NLP to support peer review, but they are based on black-box techniques and their outputs are difficult to interpret and trust. In this paper, we propose a novel pipeline to support and understand the reviewing and decision-making processes of peer review: the PeerArg system combining LLMs with methods from knowledge representation. PeerArg takes in input a set of reviews for a paper and outputs the paper acceptance prediction. We evaluate the performance of the PeerArg pipeline on three different datasets, in comparison with a novel end-2-end LLM that uses few-shot learning to predict paper acceptance given reviews. The results indicate that the end-2-end LLM is capable of predicting paper acceptance from reviews, but a variant of the PeerArg pipeline outperforms this LLM.

PDF HTML Abstract

Argumentative Peer Review with LLMs: An Analysis of PeerArg

In the research paper titled "PeerArg: Argumentative Peer Review with LLMs," the authors propose an innovative method for enhancing the peer review process using a hybrid system that combines LLMs with knowledge representation techniques, specifically computational argumentation. The authors highlight the inherent biases and subjective nature of traditional peer review processes and aim to address these shortcomings with a novel system designed to improve transparency and reliability in review aggregation.

Methodology

The paper introduces PeerArg, a pipeline that processes peer reviews to predict the acceptance of scientific papers. PeerArg integrates LLMs and methods from bipolar argumentation frameworks (BAFs) to model the reviewers' arguments and decisions systematically. The proposed approach begins by extracting arguments from reviews, structuring them into a framework, and utilizing symbolic AI for comprehensive review aggregation.

The researchers compare PeerArg with an end-to-end LLM model that uses few-shot learning to predict a paper's acceptance status. This model adopts a quantized 4-bit pretrained Mistral-7B-v0.1 LLM, structuring input as textual reviews followed by acceptance or rejection decisions. By evaluating both systems on datasets such as Peer-Review-Analyze (PRA), PeerRead, and Multi-disciplinary Open Peer Review Dataset (MOPRD), the authors present empirical evidence favoring PeerArg's performance.

Empirical Results

The results indicate that PeerArg outperforms the end-to-end LLM model across multiple datasets. Notably, PeerArg offers enhanced transparency due to its symbolic reasoning layer, which permits a more interpretable review aggregation process compared to the black-box nature of LLMs. The system performs best using specific configurations: sentiment scores as base ratings, employment of the DF-QuAD or MLP-based semantics for argument strength evaluation, and aggregation through majority-voting or all-accept strategies.

Theoretical and Practical Implications

This research underscores the potential for integrating computational argumentation with machine learning to refine processes that traditionally relied on human cognition. PeerArg's development suggests a path toward more unbiased and understandable decision-making tools in academic peer review, providing implications for other domains where opinion aggregation is critical. The use of symbolic AI components introduces an aspect of interpretability that could be essential in domains where trust and justification are paramount.

Future developments might explore the catalyst effect of PeerArg in academia, assessing whether its integration could streamline and enhance the efficiency of publication processes. Another potential avenue is the system's adaptability across various domains with distinct review criteria, challenging its versatility and scalability.

Conclusion

The contribution of PeerArg lies in its innovative integration of LLMs and argumentation frameworks, setting a benchmark in peer review enhancement. While the quantitative results favor PeerArg, the broader implication lies in its ability to offer a more transparent, interpretable, and potentially less biased peer review process. This research positions computational argumentation as not only a tool for symbolic reasoning but as a frontier in augmenting traditional methodologies with AI, marking a significant evolution in academic review technologies. As such, PeerArg stands as a testament to the fruitful intersection of argumentation theory and neural LLMs.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Purin Sukpanichnant (2 papers)
Anna Rapberger (9 papers)
Francesca Toni (96 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos