Comparing Knowledge Sources for Open-Domain Scientific Claim Verification (2402.02844v1)

Published 5 Feb 2024 in cs.CL, cs.AI, and cs.IR

Abstract: The increasing rate at which scientific knowledge is discovered and health claims shared online has highlighted the importance of developing efficient fact-checking systems for scientific claims. The usual setting for this task in the literature assumes that the documents containing the evidence for claims are already provided and annotated or contained in a limited corpus. This renders the systems unrealistic for real-world settings where knowledge sources with potentially millions of documents need to be queried to find relevant evidence. In this paper, we perform an array of experiments to test the performance of open-domain claim verification systems. We test the final verdict prediction of systems on four datasets of biomedical and health claims in different settings. While keeping the pipeline's evidence selection and verdict prediction parts constant, document retrieval is performed over three common knowledge sources (PubMed, Wikipedia, Google) and using two different information retrieval techniques. We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns. Likewise, BM25 excels in retrieval precision, while semantic search in recall of relevant evidence. We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.

PDF Abstract

Introduction

The proliferation of accessible online information, particularly in science and health domains, has necessitated the development of efficient fact-checking systems for scientific claims. Unlike typical setups that assume pre-identified documents containing evidence or those that operate on a confined collection of documents, the presented paper explores automated claim verification in a realistic open-domain context. This evaluation pioneers a broader and more practical approach towards the verification process by testing performance against significantly larger knowledge bases.

Experiment Design

The paper maintains a steady pipeline for evidence sentence selection and verdict prediction, while varying the knowledge sources and the document retrieval methods utilized. PubMed, Wikipedia, and Google are employed as vast repositories of information, and their effectiveness in evidence retrieval is gauged through two distinct methods: BM25 and semantic search using embeddings from BioSimCSE. The researchers root their evaluation in the eventual verdict prediction scores, leveraging four distinct datasets of biomedical and health claims, each with pre-determined veracity labels by domain experts.

Results and Analysis

From the experiments, Wikipedia and semantic search methods typically offer superior recall, showcasing these tools' ability to identify relevant evidence over a wide array of documents. In contrast, BM25 displays higher precision, indicating its capacity to find exact matches without being over-inclusive, although this comes at the expense of a broader coverage. The findings are nuanced when it comes to different types of claims and sources; Wikipedia is favored for popular health claims, while PubMed supports specialized medical inquiries more effectively. When "the whole web" is queried through Google, the results suggest an impressive performance, but nuances within the results, like higher scores on datasets with claims drawn directly from PubMed, outline the caveat of data leakage and the limitations of snippet-based evidence.

Conclusion and Future Work

The research substantiates the potential and challenges of open-domain scientific claim verification systems. Findings suggest that both PubMed and Wikipedia can serve as competent knowledge sources, with differences emerging based on the claim's nature. Dense retrieval methods are generally more effective than sparse ones, though each has particular scenarios where they may excel. The thorough exploration invites future work into areas including modeling disagreement, assessing evidence quality, and integrating retrieval-augmented generation strategies with LLMs to better support fact-checking. While the paper paves the way for robust AI-driven fact-checking, the authors duly point out the real-world applicability limitations and the ethical considerations when addressing sensitive health and medical misinformation.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Juraj Vladika (21 papers)
Florian Matthes (79 papers)

Citations (3)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/WikiResearch/status/1756990796723601468

https://twitter.com/JurajVladika/status/1754889620993491414