The Automated Verification of Textual Claims (AVeriTeC) Shared Task (2410.23850v1)
Abstract: The Automated Verification of Textual Claims (AVeriTeC) shared task asks participants to retrieve evidence and predict veracity for real-world claims checked by fact-checkers. Evidence can be found either via a search engine, or via a knowledge store provided by the organisers. Submissions are evaluated using AVeriTeC score, which considers a claim to be accurately verified if and only if both the verdict is correct and retrieved evidence is considered to meet a certain quality threshold. The shared task received 21 submissions, 18 of which surpassed our baseline. The winning team was TUDA_MAI with an AVeriTeC score of 63%. In this paper we describe the shared task, present the full results, and highlight key takeaways from the shared task.
- The fact extraction and VERification over unstructured and structured information (FEVEROUS) shared task. In Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER), pages 1–13, Dominican Republic. Association for Computational Linguistics.
- MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4685–4697, Hong Kong, China. Association for Computational Linguistics.
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
- Adrien Barbaresi. 2021. Trafilatura: A web scraping library and command-line tool for text discovery and extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 122–131, Online. Association for Computational Linguistics.
- Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".
- Improving evidence retrieval on claim verification pipeline through question enrichment. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Computational journalism: A call to arms to database researchers. In 5th Biennial Conference on Innovative Data Systems Research (CIDR).
- Assessing the reasoning abilities of chatgpt in the context of claim verification. Preprint, arXiv:2402.10735.
- Andy Dudfield. 2020. How we’re using AI to scale up global fact checking. https://fullfact.org/blog/2020/jul/afc-global/. Accessed: 2023-01-17.
- Missing counter-evidence renders NLP fact-checking unrealistic for misinformation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5916–5936, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
- H. W. Kuhn. 1955. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1-2):83–97.
- Matryoshka representation learning. In Advances in Neural Information Processing Systems.
- Debunking Handbook 2020. https://sks.to/db2020.
- Towards general text embeddings with multi-stage contrastive learning. Preprint, arXiv:2308.03281.
- GProofT: A multi-dimension multi-round fact checking framework based on claim fact extraction. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- FZI-WIM at averitec shared task: Real-world fact-checking with question answering. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Christopher Malon. 2021. Team papelo at FEVEROUS: Multi-hop evidence pursuit. In Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER), pages 40–49, Dominican Republic. Association for Computational Linguistics.
- Christopher Malon. 2024. Multi-hop evidence pursuit meets the web: Team papelo at FEVER 2024. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Shrikant Malviya and Stamos Katsigiannis. 2024. SK_DU team: Cross-encoder based evidence retrieval and question generation with improved prompt for the AVeriTeC shared task. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Generation-augmented retrieval for open-domain question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4089–4100, Online. Association for Computational Linguistics.
- FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12076–12100, Singapore. Association for Computational Linguistics.
- Automated fact checking in the news room. In The Web Conference 2019, pages 3579–3583, United States. Association for Computing Machinery (ACM). 2019 World Wide Web Conference, WWW 2019 ; Conference date: 13-05-2019 Through 17-05-2019.
- Looking beyond sentence-level natural language inference for question answering and text summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1322–1336, Online. Association for Computational Linguistics.
- Zero-shot learning and key points are all you need for automated fact-checking. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- RAG-fusion based information retrieval for fact-checking. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- MTEB: Massive text embedding benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014–2037, Dubrovnik, Croatia. Association for Computational Linguistics.
- Automated fact-checking for assisting human fact-checkers. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 4551–4558. International Joint Conferences on Artificial Intelligence Organization. Survey Track.
- Adjali Omar. 2024. Exploring retrieval augmented generation for real-world claim verification. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Varifocal question generation for fact-checking. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2532–2544, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Dunamu-ml’s submissions on AVeriTeC shared task. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Karl Pearson. 1896. Vii. mathematical contributions to the theory of evolution.—iii. regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, (187):253–318.
- Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
- InFact: A strong baseline for automated fact-checking. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Averitec: A dataset for real-world claim verification with evidence from the web. In Advances in Neural Information Processing Systems, volume 36, pages 65128–65167. Curran Associates, Inc.
- The intended uses of automated fact-checking artefacts: Why, how and who. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8618–8642, Singapore. Association for Computational Linguistics.
- Get your vitamin C! robust fact verification with contrastive evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 624–643, Online. Association for Computational Linguistics.
- UHH at AVeriTeC: RAG for fact-checking with real-world claims. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Evidence-backed fact checking using RAG and few-shot in-context learning with LLMs. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- C. Spearman. 1987. The proof and measurement of association between two things. The American Journal of Psychology, 100(3/4):441–471.
- FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.
- The fact extraction and VERification (FEVER) shared task. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 1–9, Brussels, Belgium. Association for Computational Linguistics.
- AIC CTU system at AVeriTeC: Re-framing automated fact-checking as a simple RAG task. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Retrieving semantics for fact-checking: A comparative approach using CQ (claim to question) & aq (answer to question). In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- Andreas Vlachos and Sebastian Riedel. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, pages 18–22, Baltimore, MD, USA. Association for Computational Linguistics.
- Fact or fiction: Verifying scientific claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550, Online. Association for Computational Linguistics.
- William Yang Wang. 2017. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 422–426, Vancouver, Canada. Association for Computational Linguistics.
- The herd of open llms for verifying real-world claims. In Proceedings of the Seventh Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics.
- mgte: Generalized long-context text representation and reranking models for multilingual text retrieval. Preprint, arXiv:2407.19669.