The Perils & Promises of Fact-checking with Large Language Models (2310.13549v2)
Abstract: Automated fact-checking, using machine learning to verify claims, has grown vital as misinformation spreads beyond human fact-checking capacity. LLMs like GPT-4 are increasingly trusted to write academic papers, lawsuits, and news articles and to verify information, emphasizing their role in discerning truth from falsehood and the importance of being able to verify their outputs. Understanding the capacities and limitations of LLMs in fact-checking tasks is therefore essential for ensuring the health of our information ecosystem. Here, we evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions. Importantly, in our framework, agents explain their reasoning and cite the relevant sources from the retrieved context. Our results show the enhanced prowess of LLMs when equipped with contextual information. GPT-4 outperforms GPT-3, but accuracy varies based on query language and claim veracity. While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy. Our investigation calls for further research, fostering a deeper comprehension of when agents succeed and when they fail.
- Ethan Porter and Thomas J Wood “The global effectiveness of fact-checking: Evidence from simultaneous experiments in Argentina, Nigeria, South Africa, and the United Kingdom” In Proceedings of the National Academy of Sciences 118.37 National Acad Sciences, 2021, pp. e2104235118
- David S Morris, Jonathan S Morris and Peter L Francia “A fake news inoculation? Fact checkers, partisan identification, and the power of misinformation” In Politics, Groups, and Identities 8.5 Taylor & Francis, 2020, pp. 986–1005
- “Estimating fact-checking’s effects” In Arlington, VA: American Press Institute, 2015
- Pulitzer Prize Board “2009 Pulitzer Prize Winners” Accessed: 2023-10-20 In The Pulitzer Prizes, 2009 URL: https://www.pulitzer.org/prize-winners-by-year/2009
- Alexios Mantzarlis “Fact-checking 101” In Journalism, fake news & disinformation: Handbook for journalism education and training Unesco Publishing Paris, France, 2018, pp. 85–100
- Alexandre Bovet and Hernán A Makse “Influence of fake news in Twitter during the 2016 US presidential election” In Nature communications 10.1 Nature Publishing Group UK London, 2019, pp. 7
- “Fake news on Twitter during the 2016 U.S. presidential election” In Science 363.6425, 2019, pp. 374–378 DOI: 10.1126/science.aau2706
- “Political polarization of news media and influencers on Twitter in the 2016 and 2020 US presidential elections” In Nature Human Behaviour 7, 2023, pp. 904––916 DOI: 10.1038/s41562-023-01550-8
- “The rise of fact-checking sites in Europe” Reuters Institute for the Study of Journalism, 2016
- “How COVID drove the evolution of fact-checking” In Harvard Kennedy School Misinformation Review, 2021
- “The quest to automate fact-checking” In Proceedings of the 2015 computation+ journalism symposium, 2015 Citeseer
- “Progress toward “the holy grail”: The continued quest to automate fact-checking” In Computation+ Journalism Symposium,(September), 2017
- Emma Hoes, Sacha Altay and Juan Bermeo “Leveraging ChatGPT for Efficient Fact-Checking” Preprint at: https://osf.io/preprints/psyarxiv/qnjkf/, 2023
- Paulina Okunytė “Google search exposes academics using chatgpt in research” Accessed: 2023-10-20 In Cybernews Cybernews, 2023 URL: https://cybernews.com/news/academic-cheating-chatgpt-openai/
- “The CHATGPT lawyer explains himself” Accessed: 2023-10-20 In The New York Times The New York Times, 2023 URL: https://www.nytimes.com/2023/06/08/nyregion/lawyer-chatgpt-sanctions.html
- “Investigating the Impact of User Trust on the Adoption and Use of ChatGPT: Survey Analysis” In Journal of Medical Internet Research 25 JMIR Publications Toronto, Canada, 2023, pp. e47184
- “Explainable Automated Fact-Checking: A Survey” In Proceedings of the 28th International Conference on Computational Linguistics Barcelona, Spain (Online): International Committee on Computational Linguistics, 2020, pp. 5430–5443 DOI: 10.18653/v1/2020.coling-main.474
- “SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours” In Proceedings of the 13th International Workshop on Semantic Evaluation Minneapolis, Minnesota, USA: Association for Computational Linguistics, 2019, pp. 845–854 DOI: 10.18653/v1/S19-2147
- “Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection” In International Conference of the Cross-Language Evaluation Forum for European Languages, 2022, pp. 495–520 Springer
- “Overview of the CLEF-2022 CheckThat! lab task 1 on identifying relevant claims in tweets”, 2022
- “Overview of the CLEF-2022 CheckThat! lab task 2 on detecting previously fact-checked claims”, 2022
- “Overview of the CLEF-2022 CheckThat! lab task 3 on fake news detection” In Working Notes of CLEF, 2022
- “The Fact Extraction and VERification (FEVER) Shared Task” In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER) Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 1–9 DOI: 10.18653/v1/W18-5501
- “Claimbuster: The first-ever end-to-end fact-checking system” In Proceedings of the VLDB Endowment 10.12 VLDB Endowment, 2017, pp. 1945–1948
- Xia Zeng, Amani S Abumansour and Arkaitz Zubiaga “Automated fact-checking: A survey” In Language and Linguistics Compass 15.10 Wiley Online Library, 2021, pp. e12438
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova “Bert: Pre-training of deep bidirectional transformers for language understanding” In Proceedings of naacL-HLT 1, 2019, pp. 2
- “Roberta: A robustly optimized bert pretraining approach” Preprint at: https://arxiv.org/abs/1907.11692, 2019
- “Truth of varying shades: Analyzing language in fake news and political fact-checking” In Proceedings of the 2017 conference on empirical methods in natural language processing, 2017, pp. 2931–2937
- Zhijiang Guo, M. Schlichtkrull and Andreas Vlachos “A Survey on Automated Fact-Checking” In Transactions of the Association for Computational Linguistics 10, 2021, pp. 178–206
- “FEVER: a Large-scale Dataset for Fact Extraction and VERification” In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) New Orleans, Louisiana: Association for Computational Linguistics, 2018, pp. 809–819 DOI: 10.18653/v1/N18-1074
- “Fact or Fiction: Verifying Scientific Claims” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Online: Association for Computational Linguistics, 2020, pp. 7534–7550 DOI: 10.18653/v1/2020.emnlp-main.609
- “MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims” Preprint at: https://arxiv.org/abs/1909.03242, 2019
- “Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases” In Found. Trends Databases 10, 2020, pp. 108–490
- “Attention is all you need” In Advances in neural information processing systems 30, 2017
- Christian Buck, Kenneth Heafield and Bas Van Ooyen “N-gram Counts and Language Models from the Common Crawl.” In LREC 2, 2014, pp. 4
- “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
- OpenAI “Introducing chatgpt” Accessed: 2023-10-20 In Introducing ChatGPT, 2022 URL: https://openai.com/blog/chatgpt
- “ReAct: Synergizing Reasoning and Acting in Language Models” Preprint at: https://arxiv.org/abs/2210.03629, 2023
- Harrison Chase “LangChain”, 2022 URL: https://github.com/hwchase17/langchain
- “GPT-3.5 vs GPT-4: Evaluating ChatGPT’s Reasoning Performance in Zero-shot Learning” Preprint at: https://arxiv.org/abs/2305.12477, 2023
- Rishabh Misra “Politifact Fact Check Dataset”, 2022 DOI: 10.13140/RG.2.2.29923.22566
- “A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity” Preprint at https://arXiv:2302.04023, 2023
- “Is ChatGPT a good translator? Yes with GPT-4 as the engine” Preprint at https://arXiv:2301.08745, 2023
- “Multilingual machine translation with large language models: Empirical results and analysis” Preprint at: https://arXiv:2304.04675, 2023
- OpenAI “OpenAI platform” Accessed: 2023-10-20 In OpenAI Platform URL: https://platform.openai.com/docs/models/gpt-4
- Dorian Quelle (5 papers)
- Alexandre Bovet (22 papers)