Unknown downstream impacts on scientific decision-making from LLM-shaped review criteria

Characterize the downstream impacts on scientific decision-making—including paper acceptance, research incentives, and field trajectories—resulting from systematic differences in evaluation criteria between large language model–generated peer reviews and human-written peer reviews at venues such as ICLR.

Background

The study finds that LLM-generated reviews prioritize different criteria than human reviews (e.g., emphasizing reproducibility and scalability over clarity and relevance) and assign higher scores on average. The authors argue that such shifts could influence which research is validated and incentivized.

They explicitly state that the downstream impacts of these changes are not yet known, framing a concrete, institution-specific open problem about how altered review criteria may reshape scientific outcomes.

References

These results demonstrate that the criteria under human review and LLM review are significantly different, which will have as-yet-unknown downstream impacts on the decisions made about what scientific work is valid and incentivized.

How LLMs Distort Our Written Language  (2603.18161 - Abdulhai et al., 18 Mar 2026) in Section 4, Subsection "LLMs Distort Decisions Affecting Scientific Institutions"