FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information (2106.05707v3)

Published 10 Jun 2021 in cs.CL

Abstract: Fact verification has attracted a lot of attention in the machine learning and natural language processing communities, as it is one of the key methods for detecting misinformation. Existing large-scale benchmarks for this task have focused mostly on textual sources, i.e. unstructured information, and thus ignored the wealth of information available in structured formats, such as tables. In this paper we introduce a novel dataset and benchmark, Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict. Furthermore, we detail our efforts to track and minimize the biases present in the dataset and could be exploited by models, e.g. being able to predict the label without using evidence. Finally, we develop a baseline for verifying claims against text and tables which predicts both the correct evidence and verdict for 18% of the claims.

PDF Abstract

An Academic Overview of the FEVEROUS Dataset

This essay provides an expert review of the paper introducing FEVEROUS: Fact Extraction and Verification Over Unstructured and Structured Information. In the context of increasing misinformation, automated fact verification represents a burgeoning area of interest within the machine learning and NLP communities. This paper contributes to the field by addressing a notable gap: the consideration of both unstructured and structured data in fact verification tasks.

Dataset Development and Structure

FEVEROUS is presented as a new dataset comprising 87,026 verified claims, each annotated with corresponding evidence from Wikipedia, encompassing both textual information and table-based data. This is a distinct departure from previous datasets, such as FEVER and TabFact, which have been predominantly text-centric or focused exclusively on table data under contrived settings. By leveraging both evidence modalities, FEVEROUS provides a more comprehensive resource for developing and assessing fact-checking models.

Each claim in FEVEROUS is labeled according to its alignment with the evidence: supported, refuted, or not enough information (NEI) to make a determination. The annotations are manually crafted and verified, ensuring a high level of accuracy and reliability. The dataset's complexity is evidenced by the need for annotators to navigate entire Wikipedia pages, indicating a real-world application challenge that previous datasets do not capture as effectively.

Baseline and Results

The paper also introduces a baseline model for FEVEROUS, which is constructed using a combination of entity matching and TF-IDF-based retrieval to extract pertinent sentences and tables. Further, a RoBERTa classifier trained on multiple NLI datasets is utilized to predict evidence relevance and classify the claim's veracity. This baseline successfully predicts both evidence and verdict for 18% of claims, showcasing the challenge of tasks involving dual-modal evidence.

The retriever displays notable document and passage coverage, retrieving relevant content with appreciable recall at varied levels. This retrieval capability is crucial given the diverse structure of evidence in FEVEROUS. Overall, the baseline's performance illuminates the challenges embedded in such dual-evidence tasks and underscores the dataset's role as a robust benchmark for future research.

Implications and Future Directions

The creation of FEVEROUS has notable implications for fact verification. It bridges a critical gap left by prior datasets by introducing structured data, such as tables, into the fact-checking paradigm. This inclusion increases the ecological validity of the task by better simulating real-world information contexts where data often exists in tabular forms.

For theoretical development, FEVEROUS opens new avenues in researching how structured information can be integrated into NLP systems. The dataset challenges systems to unify disparate data types within the same verification task, calling for advancements in multi-modal processing and hybrid architectures.

Practically, the insights gained from FEVEROUS can enhance applications in areas like journalism and content moderation, where automated verification could play a pivotal role in managing the influx of unverified claims.

Conclusion

FEVEROUS stands as a pivotal resource for advancing fact verification research and applications. By addressing the combined use of unstructured and structured information, it sets new standards for datasets in the field. Given its complex nature, FEVEROUS not only promotes the development of more sophisticated models but also inspires future research into the integration of diverse data types in automated systems. As the field continues to evolve, FEVEROUS will undoubtedly serve as a vital benchmark for evaluating and advancing fact extraction and verification methodologies.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Rami Aly (10 papers)
Zhijiang Guo (55 papers)
Michael Schlichtkrull (17 papers)
James Thorne (48 papers)
Andreas Vlachos (70 papers)
Christos Christodoulopoulos (15 papers)
Oana Cocarascu (14 papers)
Arpit Mittal (15 papers)

Citations (156)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - Raldir/FEVEROUS: Repository for Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), accepted to NeurIPS 2021 Dataset and Benchmarks and used for the FEVER Workshop Shared Task at EMNLP2021. (72 stars)