Emergent Mind

Comparing Knowledge Sources for Open-Domain Scientific Claim Verification

Published Feb 5, 2024 in cs.CL , cs.AI , cs.IR and


The increasing rate at which scientific knowledge is discovered and health claims shared online has highlighted the importance of developing efficient fact-checking systems for scientific claims. The usual setting for this task in the literature assumes that the documents containing the evidence for claims are already provided and annotated or contained in a limited corpus. This renders the systems unrealistic for real-world settings where knowledge sources with potentially millions of documents need to be queried to find relevant evidence. In this paper, we perform an array of experiments to test the performance of open-domain claim verification systems. We test the final verdict prediction of systems on four datasets of biomedical and health claims in different settings. While keeping the pipeline's evidence selection and verdict prediction parts constant, document retrieval is performed over three common knowledge sources (PubMed, Wikipedia, Google) and using two different information retrieval techniques. We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns. Likewise, BM25 excels in retrieval precision, while semantic search in recall of relevant evidence. We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.


  • The paper discusses the development of automated claim verification systems in open-domain scientific contexts using large knowledge bases.

  • Various information sources, including PubMed, Wikipedia, and Google, are analyzed for their effectiveness in retrieving evidence through BM25 and semantic search methodologies.

  • The study reveals that while Wikipedia and semantic search have better recall, BM25 demonstrates higher precision, with performance nuances depending on the type of claim and source.

  • The need for future work in modeling disagreement, assessing evidence quality, and integrating advanced retrieval strategies with LLMs is highlighted, along with the real-world limitations of such systems.


The proliferation of accessible online information, particularly in science and health domains, has necessitated the development of efficient fact-checking systems for scientific claims. Unlike typical setups that assume pre-identified documents containing evidence or those that operate on a confined collection of documents, the presented study explores automated claim verification in a realistic open-domain context. This evaluation pioneers a broader and more practical approach towards the verification process by testing performance against significantly larger knowledge bases.

Experiment Design

The study maintains a steady pipeline for evidence sentence selection and verdict prediction, while varying the knowledge sources and the document retrieval methods utilized. PubMed, Wikipedia, and Google are employed as vast repositories of information, and their effectiveness in evidence retrieval is gauged through two distinct methods: BM25 and semantic search using embeddings from BioSimCSE. The researchers root their evaluation in the eventual verdict prediction scores, leveraging four distinct datasets of biomedical and health claims, each with pre-determined veracity labels by domain experts.

Results and Analysis

From the experiments, Wikipedia and semantic search methods typically offer superior recall, showcasing these tools' ability to identify relevant evidence over a wide array of documents. In contrast, BM25 displays higher precision, indicating its capacity to find exact matches without being over-inclusive, although this comes at the expense of a broader coverage. The findings are nuanced when it comes to different types of claims and sources; Wikipedia is favored for popular health claims, while PubMed supports specialized medical inquiries more effectively. When "the whole web" is queried through Google, the results suggest an impressive performance, but nuances within the results, like higher scores on datasets with claims drawn directly from PubMed, outline the caveat of data leakage and the limitations of snippet-based evidence.

Conclusion and Future Work

The research substantiates the potential and challenges of open-domain scientific claim verification systems. Findings suggest that both PubMed and Wikipedia can serve as competent knowledge sources, with differences emerging based on the claim's nature. Dense retrieval methods are generally more effective than sparse ones, though each has particular scenarios where they may excel. The thorough exploration invites future work into areas including modeling disagreement, assessing evidence quality, and integrating retrieval-augmented generation strategies with LLMs to better support fact-checking. While the study paves the way for robust AI-driven fact-checking, the authors duly point out the real-world applicability limitations and the ethical considerations when addressing sensitive health and medical misinformation.

Get summaries of trending AI/ML papers delivered straight to your inbox

Unsubscribe anytime.