Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

112

MARG: Multi-Agent Review Generation for Scientific Papers (2401.04259v1)

Published 8 Jan 2024 in cs.CL

Abstract: We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion. By distributing paper text across agents, MARG can consume the full text of papers beyond the input length limitations of the base LLM, and by specializing agents and incorporating sub-tasks tailored to different comment types (experiments, clarity, impact) it improves the helpfulness and specificity of feedback. In a user study, baseline methods using GPT-4 were rated as producing generic or very generic comments more than half the time, and only 1.7 comments per paper were rated as good overall in the best baseline. Our system substantially improves the ability of GPT-4 to generate specific and helpful feedback, reducing the rate of generic comments from 60% to 29% and generating 3.7 good comments per paper (a 2.2x improvement).

PDF HTML Abstract

Overview

The multi-agent review generation method, MARG-S, has introduced a means to tackle one of the recent challenges posed by the limitations of LLMs such as GPT-4. This innovative approach delegates the task of generating peer review feedback on scientific papers across multiple instances of a LLM. By distributing the text among several "agents," each handling a fragment and communicating with others, MARG-S can handle longer texts effectively. It enhances specificity and helpfulness in feedback by specializing agents to focus on specific aspects of critique such as experimentation, clarity, and impact.

System Design

MARG-S's architecture consists of a designated leader agent orchestrating the process with multiple worker agents, each provided with a section of the scientific paper, and specialized expert agents focusing on different review aspects. The coordination relies on a communication protocol, allowing agents to exchange messages to gather insights across the paper's entirety. The method also includes a crucial refinement stage where initial feedback undergoes a polishing process, improving clarity and ensuring comments are contextually relevant prior to presenting to the user.

User Study Evaluation

In the MARG-S evaluation through a user paper, the multi-agent approach showed a remarkable improvement in the quality of generated comments compared with the baseline methods. Feedback from users suggested MARG-S offered specific, accurate, and actionable suggestions. However, while MARG-S surpassed other methods in producing "good" comments, broadly beneficial improvements are still possible, indicated by a notable proportion of comments being deemed as "bad" or "highly inaccurate" across all methods.

Potential and Challenges

The introduction of MARG-S into the domain of scientific review generation reflects a promising leap forward. It not only showcases an advanced application of LLMs but also exhibits a potential model for future enhancement of AI-driven peer-review systems. The increase in the cost of running such multi-agent systems, however, points toward a significant consideration for practical deployment. Future iterations of MARG-S will benefit from optimization for cost and efficiency, the inclusion of related literature for more informed reviews, and advancements in managing the agent communication to handle even larger inputs without overwhelming the system’s capacity. With further refinement, systems like MARG-S could significantly aid scientific communities in the review process, offering more comprehensive, insightful feedback to authors and potentially reshaping the peer review landscape.

PDF Markdown Bookmark Chat (Pro)

References (41)

Authors (4)

Mike D'Arcy (8 papers)
Tom Hope (41 papers)
Larry Birnbaum (7 papers)
Doug Downey (50 papers)

Citations (13)

View on Semantic Scholar

Tweets

https://twitter.com/_DougDowney/status/1747801999318630527

https://twitter.com/Hoper_Tom/status/1820801662606336062

https://twitter.com/Hoper_Tom/status/1823774560702886293

https://twitter.com/Hoper_Tom/status/1837756809517420809

https://twitter.com/jonas_kg/status/1844071501064175704

https://twitter.com/shahanmemon/status/1861709706756259861