Zero-shot Generative Large Language Models for Systematic Review Screening Automation (2401.06320v2)

Published 12 Jan 2024 in cs.IR and cs.CL

Abstract: Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase, where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot LLMs~(LLMs) for automatic screening. We evaluate the effectiveness of eight different LLMs and investigate a calibration technique that uses a predefined recall threshold to determine whether a publication should be included in a systematic review. Our comprehensive evaluation using five standard test collections shows that instruction fine-tuning plays an important role in screening, that calibration renders LLMs practical for achieving a targeted recall, and that combining both with an ensemble of zero-shot models saves significant screening time compared to state-of-the-art approaches.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (1)

Kozorovitsky, A.K., Kurland, O.: From“identical”to“similar”: Fusing retrieved lists based on inter-document similarities. Journal of Artificial Intelligence Research 41, 267–296 (2011)

Authors (6)

Shuai Wang (466 papers)
Harrisen Scells (22 papers)
Shengyao Zhuang (42 papers)
Martin Potthast (64 papers)
Bevan Koopman (37 papers)
Guido Zuccon (73 papers)

Citations (7)

View on Semantic Scholar

Tweets

https://twitter.com/MaxCallaghan5/status/1879502662905024989

Zero-shot Generative Large Language Models for Systematic Review Screening Automation (2401.06320v2)

Related Papers

Tweets