AmbiFC: Fact-Checking Ambiguous Claims with Evidence (2104.00640v4)
Abstract: Automated fact-checking systems verify claims against evidence to predict their veracity. In real-world scenarios, the retrieved evidence may not unambiguously support or refute the claim and yield conflicting but valid interpretations. Existing fact-checking datasets assume that the models developed with them predict a single veracity label for each claim, thus discouraging the handling of such ambiguity. To address this issue we present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs. It contains fine-grained evidence annotations of 50k passages from 5k Wikipedia pages. We analyze the disagreements arising from ambiguity when comparing claims against evidence in AmbiFC, observing a strong correlation of annotator disagreement with linguistic phenomena such as underspecification and probabilistic reasoning. We develop models for predicting veracity handling this ambiguity via soft labels and find that a pipeline that learns the label distribution for sentence-level evidence selection and veracity prediction yields the best performance. We compare models trained on different subsets of AmbiFC and show that models trained on the ambiguous instances perform better when faced with the identified linguistic phenomena.
- The Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS) Shared Task. In Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER), pages 1–13, Dominican Republic. Association for Computational Linguistics.
- MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4685–4697, Hong Kong, China. Association for Computational Linguistics.
- Stop Measuring Calibration When Humans Disagree. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1892–1915, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Seeing Things from a Different Angle: Discovering Diverse Perspectives about Claims. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 542–557, Minneapolis, Minnesota. Association for Computational Linguistics.
- Generic Statements Require Little Evidence for Acceptance but Have Powerful Implications. Cognitive science, 34 8:1452–1482.
- BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936, Minneapolis, Minnesota. Association for Computational Linguistics.
- The PASCAL Recognising Textual Entailment Challenge. In Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment: First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, April 11-13, 2005, Revised Selected Papers, pages 177–190. Springer.
- Alexander Philip Dawid and Allan M. Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):20–28.
- Shrey Desai and Greg Durrett. 2020. Calibration of Pre-trained Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 295–302, Online. Association for Computational Linguistics.
- CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims. In Tackling Climate Change with Machine Learning workshop at NeurIPS 2020, Online.
- FaithDial: A Faithful Benchmark for Information-Seeking Dialogue. Transactions of the Association for Computational Linguistics, 10:1473–1490.
- When the Majority is Wrong: Leveraging Annotator Disagreement for Subjective Tasks. arXiv preprint arXiv:2305.06626v3.
- Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2591–2597, Online. Association for Computational Linguistics.
- Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for Misinformation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5916–5936, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- HP Grice. 1975. Logic and conversation. Foundations of Cognitive Psychology, page 719.
- On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1321–1330. PMLR.
- A Survey on Automated Fact-Checking. Transactions of the Association for Computational Linguistics, 10:178–206.
- Ivan Habernal and Iryna Gurevych. 2017. Argumentation Mining in User-Generated Web Discourse. Computational Linguistics, 43(1):125–179.
- A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 493–503, Hong Kong, China. Association for Computational Linguistics.
- Automatic Fake News Detection: Are Models Learning to Reason? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 80–86, Online. Association for Computational Linguistics.
- DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv preprint arXiv:2111.09543v3.
- Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531.
- Jerry R Hobbs. 1979. Coherence and coreference. Cognitive Science, 3(1):67–90.
- Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12):1–38.
- Nan-Jiang Jiang and Marie-Catherine de Marneffe. 2022. Investigating Reasons for Disagreement in Natural Language Inference. Transactions of the Association for Computational Linguistics, 10:1357–1374.
- HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3441–3460, Online. Association for Computational Linguistics.
- WiCE: Real-World Entailment for Claims in Wikipedia. arXiv preprint arXiv:2303.01432v1.
- Lauri Karttunen. 1974. Presupposition and Linguistic Context. Theoretical Linguistics, 1(1-3):181–194.
- Rosanna Kenney and Peter Smith. 1997. Vagueness: A Reader. The MIT Press.
- WatClaimCheck: A new Dataset for Claim Entailment and Inference. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1293–1304, Dublin, Ireland. Association for Computational Linguistics.
- FactKG: Fact Verification via Reasoning on Knowledge Graphs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16190–16206, Toronto, Canada. Association for Computational Linguistics.
- SemEval-2023 Task 11: Learning With Disagreements (LeWiDi). arXiv preprint arXiv:2304.14803v1.
- Chloe Lim. 2018. Checking how fact-checkers check. Research & Politics, 5(3):2053168018786848.
- AmbigQA: Answering Ambiguous Open-domain Questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5783–5797, Online. Association for Computational Linguistics.
- What Can We Learn from Collective Human Opinions on Natural Language Inference Data? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9131–9143, Online. Association for Computational Linguistics.
- FaVIQ: FAct Verification from Information-seeking Questions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5154–5166, Dublin, Ireland. Association for Computational Linguistics.
- Ellie Pavlick and Tom Kwiatkowski. 2019. Inherent Disagreements in Human Textual Inferences. Transactions of the Association for Computational Linguistics, 7:677–694.
- Human Uncertainty Makes Classification More Robust. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9617–9626.
- Barbara Plank. 2022. The “Problem” of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10671–10682, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- On Releasing Annotator-Level Labels and Information in Datasets"´. In Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, pages 133–138, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2116–2129, Online. Association for Computational Linguistics.
- Evidence-based Fact-Checking of Health-related Claims. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3499–3512, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Automated Fact-Checking of Claims from Wikipedia. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6874–6882, Marseille, France. European Language Resources Association.
- AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web. arXiv preprint arXiv:2305.13117v2.
- Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 624–643, Online. Association for Computational Linguistics.
- Towards Debiasing Fact Verification Models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3419–3425, Hong Kong, China. Association for Computational Linguistics.
- Multi2Claim: Generating Scientific Claims from Multi-Choice Questions for Scientific Fact-Checking. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2652–2664, Dubrovnik, Croatia. Association for Computational Linguistics.
- FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.
- A Case for Soft Loss Functions. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 8, pages 173–177.
- Learning from Disagreement: A Survey. J. Artif. Int. Res., 72:1385–1470.
- A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python. In The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track, HCOMP 2021.
- Fact or Fiction: Verifying Scientific Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550, Online. Association for Computational Linguistics.
- A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
- Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Max Glockner (9 papers)
- Ieva Staliūnaitė (5 papers)
- James Thorne (48 papers)
- Gisela Vallejo (4 papers)
- Andreas Vlachos (70 papers)
- Iryna Gurevych (264 papers)