Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Credible, Unreliable or Leaked?: Evidence Verification for Enhanced Automated Fact-checking (2404.18971v1)

Published 29 Apr 2024 in cs.CL, cs.CY, cs.IR, and cs.SI

Abstract: Automated fact-checking (AFC) is garnering increasing attention by researchers aiming to help fact-checkers combat the increasing spread of misinformation online. While many existing AFC methods incorporate external information from the Web to help examine the veracity of claims, they often overlook the importance of verifying the source and quality of collected "evidence". One overlooked challenge involves the reliance on "leaked evidence", information gathered directly from fact-checking websites and used to train AFC systems, resulting in an unrealistic setting for early misinformation detection. Similarly, the inclusion of information from unreliable sources can undermine the effectiveness of AFC systems. To address these challenges, we present a comprehensive approach to evidence verification and filtering. We create the "CREDible, Unreliable or LEaked" (CREDULE) dataset, which consists of 91,632 articles classified as Credible, Unreliable and Fact checked (Leaked). Additionally, we introduce the EVidence VERification Network (EVVER-Net), trained on CREDULE to detect leaked and unreliable evidence in both short and long texts. EVVER-Net can be used to filter evidence collected from the Web, thus enhancing the robustness of end-to-end AFC systems. We experiment with various LLMs and show that EVVER-Net can demonstrate impressive performance of up to 91.5% and 94.4% accuracy, while leveraging domain credibility scores along with short or long texts, respectively. Finally, we assess the evidence provided by widely-used fact-checking datasets including LIAR-PLUS, MOCHEG, FACTIFY, NewsCLIPpings+ and VERITE, some of which exhibit concerning rates of leaked and unreliable evidence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Too good to be true, too good not to share: the social utility of fake news. Information, Communication & Society, 23(13):1965–1979, 2020.
  2. Fake news on social media: the impact on society. Information Systems Frontiers, pages 1–16, 2022.
  3. A survey on natural language processing for fake news detection. arXiv preprint arXiv:1811.00770, 2018.
  4. Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2185–2194, 2021.
  5. A survey on multimodal disinformation detection. arXiv preprint arXiv:2103.12541, 2021.
  6. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics, 10:178–206, 2022.
  7. Missing counter-evidence renders NLP fact-checking unrealistic for misinformation. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5916–5936, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
  8. Multifc: A real-world multi-domain dataset for evidence-based fact checking of claims. arXiv preprint arXiv:1909.03242, 2019.
  9. Rishabh Misra. Politifact Fact Check Dataset. https://www.kaggle.com/datasets/rmisra/politifact-fact-check-dataset, 2022.
  10. Explainable automated fact-checking for public health claims. arXiv preprint arXiv:2010.09926, 2020.
  11. Sampling the news producers: A large news and feature data set for the study of the complex media landscape. In Proceedings of the International AAAI Conference on Web and Social Media, volume 12, 2018.
  12. Nela-gt-2018: A large multi-labelled news dataset for the study of misinformation in news articles. In Proceedings of the international AAAI conference on web and social media, volume 13, pages 630–638, 2019.
  13. Nela-gt-2019: A large multi-labelled news dataset for the study of misinformation in news articles, 2020.
  14. Nela-gt-2020: A large multi-labelled news dataset for the study of misinformation in news articles. arXiv preprint arXiv:2102.04567, 2021.
  15. Nela-gt-2021: A large multi-labelled news dataset for the study of misinformation in news articles, 2021.
  16. Nela-gt-2022: A large multi-labelled news dataset for the study of misinformation in news articles, 2023.
  17. Maciej Szpakowski. FakeNewsCorpus Dataset. https://github.com/several27/FakeNewsCorpus, 2020.
  18. Megan Risdal. Getting real about fake news, 2016.
  19. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654, 2020.
  20. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  21. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  22. Longt5: Efficient text-to-text transformer for long sequences. arXiv preprint arXiv:2112.07916, 2021.
  23. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
  24. Where is your evidence: Improving fact-checking by justification modeling. In James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, and Arpit Mittal, editors, Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 85–90, Brussels, Belgium, November 2018. Association for Computational Linguistics.
  25. Factify: A multi-modal fact verification dataset. In Proceedings of the First Workshop on Multimodal Fact-Checking and Hate Speech Detection (DE-FACTIFY), 2022.
  26. End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2733–2743, 2023.
  27. Verite: a robust benchmark for multimodal misinformation detection accounting for unimodal bias. International Journal of Multimedia Information Retrieval, 13(1):4, 2024.
  28. Open-domain, content-based, multi-modal fact-checking of out-of-context images via online resources. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14940–14949, 2022.
  29. FEVER: a large-scale dataset for fact extraction and VERification. In Marilyn Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.
  30. Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big data, 8(3):171–188, 2020.
  31. Watclaimcheck: A new dataset for claim entailment and inference. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1293–1304, 2022.
  32. Claimbuster: The first-ever end-to-end fact-checking system. Proceedings of the VLDB Endowment, 10(12):1945–1948, 2017.
  33. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing, pages 2931–2937, 2017.
  34. Verifying information with multimedia content on twitter: a comparative study of automated approaches. Multimedia tools and applications, 77:15545–15571, 2018.
  35. Cosmos: Catching out-of-context misinformation with self-supervised learning. arXiv preprint arXiv:2101.06278, 2021.
  36. Newsclippings: Automatic generation of out-of-context multimodal media. arXiv preprint arXiv:2104.05893, 2021.
  37. Visual news: Benchmark and challenges in news image captioning. arXiv preprint arXiv:2010.03743, 2020.
  38. Credibility assessment of textual claims on the web. In Proceedings of the 25th ACM international on conference on information and knowledge management, pages 2173–2178, 2016.
  39. Declare: Debunking fake news and false claims using evidence-aware deep learning. arXiv preprint arXiv:1809.06416, 2018.
  40. Latent retrieval for large-scale fact-checking and question answering with nli training. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pages 941–948, 2020.
  41. Red-dot: Multimodal fact-checking via relevant evidence detection. arXiv preprint arXiv:2311.09939, 2023.
  42. Fully automated fact checking using external sources. arXiv preprint arXiv:1710.00341, 2017.
  43. Averitec: A dataset for real-world claim verification with evidence from the web. Advances in Neural Information Processing Systems, 36, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com