Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation (2401.06920v1)
Abstract: Recent LLMs have been shown to be effective for misinformation detection. However, the choice of LLMs for experiments varies widely, leading to uncertain conclusions. In particular, GPT-4 is known to be strong in this domain, but it is closed source, potentially expensive, and can show instability between different versions. Meanwhile, alternative LLMs have given mixed results. In this work, we show that Zephyr-7b presents a consistently viable alternative, overcoming key limitations of commonly used approaches like Llama-2 and GPT-3.5. This provides the research community with a solid open-source option and shows open-source models are gradually catching up on this task. We then highlight how GPT-3.5 exhibits unstable performance, such that this very widely used model could provide misleading results in misinformation detection. Finally, we validate new tools including approaches to structured output and the latest version of GPT-4 (Turbo), showing they do not compromise performance, thus unlocking them for future research and potentially enabling more complex pipelines for misinformation mitigation.
- Qwen Technical Report. arXiv preprint arXiv:2309.16609.
- Open LLM Leaderboard. https://huggingface.co/spaces/HuggingFaceH4/open˙llm˙leaderboard.
- Caramancion, K. M. 2023. Harnessing the Power of ChatGPT to Decimate Mis/Disinformation: Using ChatGPT for Fake News Detection. In 2023 IEEE World AI IoT Congress (AIIoT), 0042–0046.
- Can LLM-Generated Misinformation Be Detected? arXiv:2309.13788.
- Combating Misinformation in the Age of LLMs: Opportunities and Challenges. arXiv preprint arXiv:2311.05656.
- How is ChatGPT’s behavior changing over time? arXiv:2307.09009.
- Can Large Language Models Understand Content and Propagation for Misinformation Detection: An Empirical Study. arXiv:2311.12699.
- DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models. arXiv:2309.03883.
- The case for 4-bit precision: k-bit Inference Scaling Laws. arXiv:2212.09720.
- Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning. arXiv:2305.13971.
- Leveraging ChatGPT for Efficient Fact-Checking.
- Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection. arXiv:2309.12247.
- Mistral 7B. arXiv:2310.06825.
- Overview of the CLEF-2022 CheckThat! Lab: Task 3 on Fake News Detection. In Faggioli, G.; 0001, N. F.; Hanbury, A.; and Potthast, M., eds., Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th - to - 8th, 2022, volume 3180 of CEUR Workshop Proceedings, 404–421. CEUR-WS.org.
- Machine learning explanations to prevent overtrust in fake news detection. In Proceedings of the international AAAI conference on web and social media, volume 15, 421–431.
- Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4. arXiv:2305.14928.
- The Perils & Promises of Fact-checking with Large Language Models. arXiv:2310.13549.
- New explainability method for BERT-based model in fake news detection. Scientific Reports, 11(1).
- iCompass at CheckThat! 2022: combining deep language models for fake news detection. Working Notes of CLEF.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288.
- ur-iw-hnt at CheckThat!-2022: Cross-lingual Text Summarization for Fake News Detection. In Faggioli, G.; 0001, N. F.; Hanbury, A.; and Potthast, M., eds., Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th - to - 8th, 2022, volume 3180 of CEUR Workshop Proceedings, 740–748. CEUR-WS.org.
- Zephyr: Direct Distillation of LM Alignment. arXiv:2310.16944.
- Wang, W. Y. 2017. “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. In Barzilay, R.; and Kan, M.-Y., eds., Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 422–426. Vancouver, Canada: Association for Computational Linguistics.
- Open, Closed, or Small Language Models for Text Classification? arXiv:2308.10092.
- Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompting Method. arXiv:2310.00305.
- Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23. ACM.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.