Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Large Language Models Detect Misinformation in Scientific News Reporting? (2402.14268v1)

Published 22 Feb 2024 in cs.CL, cs.AI, and cs.SI

Abstract: Scientific facts are often spun in the popular press with the intent to influence public opinion and action, as was evidenced during the COVID-19 pandemic. Automatic detection of misinformation in the scientific domain is challenging because of the distinct styles of writing in these two media types and is still in its nascence. Most research on the validity of scientific reporting treats this problem as a claim verification challenge. In doing so, significant expert human effort is required to generate appropriate claims. Our solution bypasses this step and addresses a more real-world scenario where such explicit, labeled claims may not be available. The central research question of this paper is whether it is possible to use LLMs to detect misinformation in scientific reporting. To this end, we first present a new labeled dataset SciNews, containing 2.4k scientific news stories drawn from trusted and untrustworthy sources, paired with related abstracts from the CORD-19 database. Our dataset includes both human-written and LLM-generated news articles, making it more comprehensive in terms of capturing the growing trend of using LLMs to generate popular press articles. Then, we identify dimensions of scientific validity in science news articles and explore how this can be integrated into the automated detection of scientific misinformation. We propose several baseline architectures using LLMs to automatically detect false representations of scientific findings in the popular press. For each of these architectures, we use several prompt engineering strategies including zero-shot, few-shot, and chain-of-thought prompting. We also test these architectures and prompting strategies on GPT-3.5, GPT-4, and Llama2-7B, Llama2-13B.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  2. Writing strategies for science communication: Data and computational analysis. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 5327–5344.
  3. # Scamdemic,# Plandemic, or# Scaredemic: what Parler social media platform tells us about COVID-19 vaccine. Vaccines 9, 5 (2021), 421.
  4. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  5. Truth, lies, and automation: How language models could change disinformation. Center for Security and Emerging Technology.
  6. Overview of the first workshop on scholarly document processing (SDP). In Proceedings of the first workshop on scholarly document processing. 1–6.
  7. Canyu Chen and Kai Shu. 2023. Can LLM-Generated Misinformation Be Detected? arXiv preprint arXiv:2309.13788 (2023).
  8. MMCoVaR: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 31–38.
  9. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588 (2022).
  10. A COVID-19 rumor dataset. Frontiers in Psychology 12 (2021), 644801.
  11. Climate-fever: A dataset for verification of real-world climate claims. arXiv preprint arXiv:2012.00614 (2020).
  12. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206.
  13. Andrew F Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication methods and measures 1, 1 (2007), 77–89.
  14. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
  15. Autodan: Generating stealthy jailbreak prompts on aligned large language models. arXiv preprint arXiv:2310.04451 (2023).
  16. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860 (2023).
  17. S2ORC: The semantic scholar open research corpus. arXiv preprint arXiv:1911.02782 (2019).
  18. Nour Mheidly and Jawad Fares. 2020. Leveraging media and health communication strategies to overcome the COVID-19 infodemic. Journal of public health policy 41, 4 (2020), 410–420.
  19. Zvi Mowshowitz. [n. d.]. Jailbreaking chatgpt on release day. ([n. d.]). https://thezvi.substack.com/p/ jailbreaking-the-chatgpt-on-release.
  20. The danger of misinformation in the COVID-19 crisis. Missouri Medicine 117, 6 (2020), 510.
  21. Jiaxin Pei and David Jurgens. 2021. Measuring sentence-level and aspect-level (un) certainty in science communications. arXiv preprint arXiv:2109.14776 (2021).
  22. Semeval-2023 task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023). 2343–2361.
  23. Scientific claim verification with VerT5erini. arXiv preprint arXiv:2010.11930 (2020).
  24. James Pustejovsky and Amber Stubbs. 2012. Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. ” O’Reilly Media, Inc.”.
  25. Fine-tuning aligned language models compromises safety, even when users do not intend to! arXiv preprint arXiv:2310.03693 (2023).
  26. Covid-19 vaccination hesitancy. Bmj 373 (2021).
  27. Alfonso J Rodriguez-Morales and Oscar H Franco. 2021. Public trust, misinformation and COVID-19 vaccination willingness in Latin America and the Caribbean: today’s key challenges. The Lancet Regional Health–Americas 3 (2021).
  28. COVID-fact: Fact extraction and verification of real-world claims on COVID-19 pandemic. arXiv preprint arXiv:2106.03794 (2021).
  29. COVID-19 fake news infodemic research dataset (COVID19-FNIR dataset). IEEE Dataport (2021).
  30. Aviv J Sharon and Ayelet Baram-Tsabari. 2020. Can science literacy help individuals identify misinformation in everyday life? Science Education 104, 5 (2020), 873–894.
  31. ” do anything now”: Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825 (2023).
  32. Chenhao Tan and Lillian Lee. 2014. A corpus of sentence-level revisions in academic writing: A step towards understanding statement strength in communication. arXiv preprint arXiv:1405.1439 (2014).
  33. The science of detecting llm-generated texts. arXiv preprint arXiv:2303.07205 (2023).
  34. Bing TermsOfUseBing. 2023. Bing conversational experiences and image creator terms. https://www.bing. com/new/termsofuse. (2023).
  35. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In NAACL-HLT.
  36. Examining the impact of sharing COVID-19 misinformation online on mental health. Scientific Reports 12, 1 (2022), 8045.
  37. Andreas Vlachos and Sebastian Riedel. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 workshop on language technologies and computational social science. 18–22.
  38. Juraj Vladika and Florian Matthes. 2023. Scientific Fact-Checking: A Survey of Resources and Approaches. arXiv preprint arXiv:2305.16859 (2023).
  39. Fact or fiction: Verifying scientific claims. arXiv preprint arXiv:2004.14974 (2020).
  40. David Wadden and Kyle Lo. 2021. Overview and insights from the SCIVER shared task on scientific claim verification. arXiv preprint arXiv:2107.08188 (2021).
  41. SciFact-open: Towards open-domain scientific claim verification. arXiv preprint arXiv:2210.13777 (2022).
  42. Check-COVID: Fact-Checking COVID-19 News Claims with Scientific Evidence. arXiv preprint arXiv:2305.18265 (2023).
  43. Cord-19: The covid-19 open research dataset. ArXiv (2020).
  44. William Yang Wang. 2017. ” liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017).
  45. Dustin Wright and Isabelle Augenstein. 2021a. Semi-supervised exaggeration detection of health science press releases. arXiv preprint arXiv:2108.13493 (2021).
  46. Dustin Wright and Isabelle Augenstein. 2021b. Semi-supervised exaggeration detection of health science press releases. arXiv preprint arXiv:2108.13493 (2021).
  47. Modeling information change in science communication with semantically matched paraphrases. arXiv preprint arXiv:2210.13001 (2022).
  48. Generating scientific claims for zero-shot scientific fact checking. arXiv preprint arXiv:2203.12990 (2022).
  49. Evidence-aware fake news detection with graph neural networks. In Proceedings of the ACM Web Conference 2022. 2501–2510.
  50. Exploring the limits of chatgpt for query or aspect-based text summarization. arXiv preprint arXiv:2302.08081 (2023).
  51. Extractive summarization via chatgpt for faithful summary generation. arXiv preprint arXiv:2304.04193 (2023).
  52. Benchmarking Large Language Models for News Summarization. arXiv preprint arXiv:2301.13848 (2023).
  53. Benchmarking large language models for news summarization. arXiv preprint arXiv:2301.13848 (2023).
  54. AnswerFact: Fact checking in product question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2407–2417.
  55. Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yupeng Cao (15 papers)
  2. Aishwarya Muralidharan Nair (1 paper)
  3. Elyon Eyimife (1 paper)
  4. Nastaran Jamalipour Soofi (1 paper)
  5. K. P. Subbalakshmi (15 papers)
  6. John R. Wullert II (2 papers)
  7. Chumki Basu (2 papers)
  8. David Shallcross (2 papers)
Citations (1)