Evaluation of an LLM in Identifying Logical Fallacies: A Call for Rigor When Adopting LLMs in HCI Research (2404.05213v1)
Abstract: There is increasing interest in the adoption of LLMs in HCI research. However, LLMs may often be regarded as a panacea because of their powerful capabilities with an accompanying oversight on whether they are suitable for their intended tasks. We contend that LLMs should be adopted in a critical manner following rigorous evaluation. Accordingly, we present the evaluation of an LLM in identifying logical fallacies that will form part of a digital misinformation intervention. By comparing to a labeled dataset, we found that GPT-4 achieves an accuracy of 0.79, and for our intended use case that excludes invalid or unidentified instances, an accuracy of 0.90. This gives us the confidence to proceed with the application of the LLM while keeping in mind the areas where it still falls short. The paper describes our evaluation approach, results and reflections on the use of the LLM for our intended task.
- B. Bennett. 2012. Logically Fallacious: The Ultimate Collection of Over 300 Logical Fallacies (Academic Edition). Ebookit.com.
- The Search for Agreement on Logical Fallacy Annotation of an Infodemic. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association, Marseille, France, 4430–4438. https://aclanthology.org/2022.lrec-1.471
- Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS’20). Curran Associates Inc., Red Hook, NY, USA, Article 159, 25 pages.
- Deconstructing climate misinformation to identify reasoning errors. Environmental Research Letters 13, 2 (feb 2018), 024018. https://doi.org/10.1088/1748-9326/aaa49f
- LLMs Accelerate Annotation for Medical Information Extraction. In Proceedings of the 3rd Machine Learning for Health Symposium (Proceedings of Machine Learning Research, Vol. 225), Stefan Hegselmann, Antonio Parziale, Divya Shanmugam, Shengpu Tang, Mercy Nyamewaa Asiedu, Serina Chang, Tom Hartvigsen, and Harvineet Singh (Eds.). PMLR, 82–100. https://proceedings.mlr.press/v225/goel23a.html
- Metrics for Multi-Class Classification: an Overview. arXiv:2008.05756 [stat.ML]
- TabLLM: Few-shot Classification of Tabular Data with Large Language Models. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 206), Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent (Eds.). PMLR, 5549–5581. https://proceedings.mlr.press/v206/hegselmann23a.html
- Timon M J Hruschka and Markus Appel. 2023. Learning about informal fallacies and the detection of fake news: An experimental intervention. PLoS One 18, 3 (March 2023), e0283238.
- Logical Fallacy Detection. In Findings of the Association for Computational Linguistics: EMNLP 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 7180–7198. https://doi.org/10.18653/v1/2022.findings-emnlp.532
- Trustworthy AI: From Principles to Practices. ACM Comput. Surv. 55, 9, Article 177 (jan 2023), 46 pages. https://doi.org/10.1145/3555803
- Jiyi Li. 2024. A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation. arXiv:2401.09760 [cs.CL]
- Morgan Lundy. 2023. TikTok and COVID-19 Vaccine Misinformation: New Avenues for Misinformation Spread, Popular Infodemic Topics, and Dangerous Logical Fallacies. International Journal of Communication 17, 0 (2023). https://ijoc.org/index.php/ijoc/article/view/19847
- Elena Musi and Chris Reed. 2022. From fallacies to semi-fake news: Improving the identification of misinformation triggers across digital media. Discourse & Society 33, 3 (2022), 349–370. https://doi.org/10.1177/09579265221076609 arXiv:https://doi.org/10.1177/09579265221076609
- On Experiments of Detecting Persuasion Techniques in Polish and Russian Online News: Preliminary Study. In Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023), Jakub Piskorski, Michał Marcińczuk, Preslav Nakov, Maciej Ogrodniczuk, Senja Pollak, Pavel Přibáň, Piotr Rybak, Josef Steinberger, and Roman Yangarber (Eds.). Association for Computational Linguistics, Dubrovnik, Croatia, 155–164. https://doi.org/10.18653/v1/2023.bsnlp-1.18
- Logical Fallacies in Social Media: A Discourse Analysis in Political Debate. In 2020 8th International Conference on Cyber and IT Service Management (CITSM). 1–5. https://doi.org/10.1109/CITSM50537.2020.9268821
- What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15691–15701.
- Breaking Down the Invisible Wall of Informal Fallacies in Online Discussions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 644–657. https://doi.org/10.18653/v1/2021.acl-long.53
- Robust and explainable identification of logical fallacies in natural language arguments. Know.-Based Syst. 266, C (apr 2023), 22 pages. https://doi.org/10.1016/j.knosys.2023.110418
- Christian Schlereth Sven Beisecker and Sebastian Hein. 2024. Shades of fake news: how fallacies influence consumers’ perception. European Journal of Information Systems 33, 1 (2024), 41–60. https://doi.org/10.1080/0960085X.2022.2110000 arXiv:https://doi.org/10.1080/0960085X.2022.2110000
- Large Language Models Are Zero-Shot Text Classifiers. arXiv:2312.01044 [cs.CL]
- Sentiment Analysis in the Era of Large Language Models: A Reality Check. arXiv:2305.15005 [cs.CL]