Claim Check-Worthiness Detection: How Well do LLMs Grasp Annotation Guidelines?
Abstract: The increasing threat of disinformation calls for automating parts of the fact-checking pipeline. Identifying text segments requiring fact-checking is known as claim detection (CD) and claim check-worthiness detection (CW), the latter incorporating complex domain-specific criteria of worthiness and often framed as a ranking task. Zero- and few-shot LLM prompting is an attractive option for both tasks, as it bypasses the need for labeled datasets and allows verbalized claim and worthiness criteria to be directly used for prompting. We evaluate the LLMs' predictive and calibration accuracy on five CD/CW datasets from diverse domains, each utilizing a different worthiness criterion. We investigate two key aspects: (1) how best to distill factuality and worthiness criteria into a prompt and (2) what amount of context to provide for each claim. To this end, we experiment with varying the level of prompt verbosity and the amount of contextual information provided to the model. Our results show that optimal prompt verbosity is domain-dependent, adding context does not improve performance, and confidence scores can be directly used to produce reliable check-worthiness rankings.
- Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 611–649, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Leak, cheat, repeat: Data contamination and evaluation malpractices in closed-source LLMs. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 67–93, St. Julian’s, Malta. Association for Computational Linguistics.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- FullFact. 2020. The challenges of online fact checking.
- NewsClaims: A new benchmark for claim detection from news with attribute knowledge. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6002–6018, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- A context-aware approach for detecting worth-checking claims in political debates. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 267–276, Varna, Bulgaria. INCOMA Ltd.
- Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1803–1812.
- Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- Is it indeed bigger better? the comprehensive study of claim detection lms applied for disinformation tackling.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213. Curran Associates, Inc.
- Toward automated factchecking: Developing an annotation schema and benchmark for consistent automated claim detection. Digital Threats, 2(2).
- J Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33 1:159–74.
- Self-checker: Plug-and-play modules for fact-checking with large language models.
- Overview of the clef-2022 checkthat! lab task 1 on identifying relevant claims in tweets. In CLEF 2022: Conference and Labs of the Evaluation Forum, volume 3180 of CEUR Workshop Proceedings, pages 368–392. CEUR Workshop Proceedings (CEUR-WS.org).
- Afacta: Assisting the annotation of factual claim detection with reliable llm annotators.
- Openfact at checkthat!-2023: Head-to-head GPT vs. BERT - A comparative study of transformers language models for the detection of check-worthy claims. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki, Greece, September 18th to 21st, 2023, volume 3497 of CEUR Workshop Proceedings, pages 453–472. CEUR-WS.org.
- Automated claim detection for fact-checking: A case study using Norwegian pre-trained language models. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 1–9, Tórshavn, Faroe Islands. University of Tartu Library.
- Environmental claim detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1051–1066, Toronto, Canada. Association for Computational Linguistics.
- Dustin Wright and Isabelle Augenstein. 2020. Claim check-worthiness detection as positive unlabelled learning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 476–488, Online. Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.