Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WiCE: Real-World Entailment for Claims in Wikipedia (2303.01432v2)

Published 2 Mar 2023 in cs.CL

Abstract: Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation. However, these represent a significant domain shift from existing entailment datasets, and models underperform as a result. We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim, and a minimal subset of evidence sentences that support each subclaim. To support this, we propose an automatic claim decomposition strategy using GPT-3.5 which we show is also effective at improving entailment models' performance on multiple datasets at test time. Finally, we show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ryo Kamoi (14 papers)
  2. Tanya Goyal (24 papers)
  3. Juan Diego Rodriguez (12 papers)
  4. Greg Durrett (117 papers)
Citations (71)

Summary

The field of NLP often requires models to verify the truthfulness of statements based on provided evidence, which can have applications ranging from fact-checking to document summarization. A new dataset, named W ICE (Wikipedia Citation Entailment), intends to tackle these challenges by offering a more realistic and fine-grained textual entailment setup.

This dataset is rooted in Wikipedia, where claims within articles are automatically identified and linked with the articles they cite as evidence. W ICE not only assesses whether a claim is supported, partially supported, or unsupported by the evidence but also provides detailed annotations for sub-sentence units within the claims, showing exactly which parts are supported by the evidence and which are not.

One notable innovation introduced alongside W ICE is an automatic claim decomposition strategy known as Claim-Split. Utilizing GPT-3.5, it breaks complex claims into more manageable subclaims, making the annotation process more efficient and possibly improving the performance of entailment models, as subclaims can be easier to evaluate than longer, more intricate statements.

W ICE is shown to pose new challenges for current entailment models that generally deal with shorter texts. Existing models, when assessed on real-world claims from the dataset, underperform due to the complex nature of evidence verification and retrieval issues that these models are not yet equipped to handle.

The importance of context and retrieval is underscored in the data analysis. Models trained to predict entailment using chunks of the evidence, combined with context, achieve better performance than those relying solely on individual sentences. However, these systems still fall short of human-level performance.

In summary, W ICE represents a step forward in the realistic assessment of models' capability to determine the factual correctness of real-world claims. Its supporting tools, like Claim-Split and fine-grained annotations, provide ways to both enhance the dataset and potentially improve model performance, emphasizing the importance of context, retrieval, and the granularity of evidence in the continuous evolution of automated fact verification systems.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com