Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations (2404.03745v3)
Abstract: The widespread adoption and transformative effects of LLMs have sparked concerns regarding their capacity to produce inaccurate and fictitious content, referred to as `hallucinations'. Given the potential risks associated with hallucinations, humans should be able to identify them. This research aims to understand the human perception of LLM hallucinations by systematically varying the degree of hallucination (genuine, minor hallucination, major hallucination) and examining its interaction with warning (i.e., a warning of potential inaccuracies: absent vs. present). Participants (N=419) from Prolific rated the perceived accuracy and engaged with content (e.g., like, dislike, share) in a Q/A format. Participants ranked content as truthful in the order of genuine, minor hallucination, and major hallucination, and user engagement behaviors mirrored this pattern. More importantly, we observed that warning improved the detection of hallucination without significantly affecting the perceived truthfulness of genuine content. We conclude by offering insights for future tools to aid human detection of hallucinations. All survey materials, demographic questions, and post-session questions are available at: https://github.com/MahjabinNahar/fakes-of-varying-shades-survey-materials
- Forbes 2023. Lawyer used chatgpt in court—and cited fake cases. a judge is considering sanctions, a.
- Reuters 2023. Exclusive: Chatgpt traffic slips again for third month in a row, b.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pp. 1877–1901. Curran Associates, Inc., 2020.
- Can llm-generated misinformation be detected? arXiv preprint arXiv:2309.13788, 2023.
- Hallucination detection: Robustly discerning reliable answers in large language models. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, pp. 245–255. Association for Computing Machinery, 2023.
- All that’s ‘human’ is not gold: Evaluating human evaluation of generated text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 7282–7296. Association for Computational Linguistics, August 2021.
- Diving deep into modes of fact hallucinations in dialogue systems. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 684–699. Association for Computational Linguistics, December 2022.
- Handling divergent reference texts when evaluating table-to-text generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4884–4895. Association for Computational Linguistics, July 2019.
- Enabling language models to fill in the blanks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2492–2501, Online, July 2020. Association for Computational Linguistics.
- Data quality in online human-subjects research: Comparisons between mturk, prolific, cloudresearch, qualtrics, and sona. Plos one, 18(3):e0279720, 2023.
- Faithdial: A faithful benchmark for information-seeking dialogue. Transactions of the Association for Computational Linguistics, 10:1473–1490, 2022a.
- Evaluating attribution in dialogue systems: The BEGIN benchmark. Transactions of the Association for Computational Linguistics, 10:1066–1083, 2022b.
- Explicit warnings reduce but do not eliminate the continued influence of misinformation. Memory & cognition, 38:1087–1100, 2010.
- Statistical power analyses using g* power 3.1: Tests for correlation and regression analyses. Behavior research methods, 41(4):1149–1160, 2009.
- James H Fetzer. Information: Does it have to be true? Minds and Machines, 14:223–229, 2004.
- Less than you think: Prevalence and predictors of fake news dissemination on facebook. Science advances, 5(1):eaau4586, 2019.
- Automatic detection of generated text is easiest when humans are fooled. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1808–1822. Association for Computational Linguistics, July 2020.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), mar 2023.
- All the news that’s fit to fabricate: Ai-generated text as a tool of media misinformation. Journal of Experimental Political Science, 9(1):104–117, 2022.
- Daniël Lakens. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and anovas. Frontiers in psychology, 4:62627, 2013.
- Misinformation, disinformation, and violent conflict: From iraq and the “war on terror” to future threats to peace. American psychologist, 68(7):487, 2013.
- HaluEval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 6449–6464. Association for Computational Linguistics, December 2023.
- The dawn after the dark: An empirical study on factuality hallucination in large language models. arXiv preprint arXiv:2401.03205, 2024.
- TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3214–3252. Association for Computational Linguistics, May 2022.
- A token-level reference-free hallucination detection benchmark for free-form text generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6723–6737. Association for Computational Linguistics, May 2022.
- Let’s hate together: How people share news in messaging, social, and public networks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–13, 2018.
- Fighting fire with fire: The dual role of LLMs in crafting and detecting elusive disinformation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 14279–14305. Association for Computational Linguistics, December 2023.
- Misinformation warning labels are widely effective: A review of warning effects and their moderating features. Current Opinion in Psychology, pp. 101710, 2023.
- Accuracy prompts are a replicable and generalizable approach for reducing the spread of misinformation. Nature communications, 13(1):2333, 2022.
- Prior exposure increases perceived accuracy of fake news. Journal of experimental psychology: general, 147(12):1865, 2018.
- Fighting covid-19 misinformation on social media: Experimental evidence for a scalable accuracy-nudge intervention. Psychological science, 31(7):770–780, 2020.
- Shifting attention to accuracy can reduce misinformation online. Nature, 592(7855):590–595, 2021.
- Detecting and mitigating hallucinations in multilingual summarisation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 8914–8932. Association for Computational Linguistics, December 2023.
- The troubling emergence of hallucination in large language models - an extensive definition, quantification, and prescriptive remediations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 2541–2573. Association for Computational Linguistics, December 2023.
- Rome was built in 1776: A case study on factual correctness in knowledge-grounded response generation. arXiv preprint arXiv:2110.05456, 2021.
- Cataloging prompt patterns to enhance the discipline of prompt engineering. URL: https://www. dre. vanderbilt. edu/~ schmidt/PDF/ADA_Europe_Position_Paper. pdf [accessed 2023-11-25], 2023.
- QuestEval: Summarization asks for fact-based evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6594–6604. Association for Computational Linguistics, November 2021.
- Trust it or not: Effects of machine-learning warnings in helping individuals mitigate misinformation. In Proceedings of the 10th ACM Conference on Web Science, pp. 265–274, 2019.
- If you have a reliable source, say something: effects of correction comments on covid-19 misinformation. In Proceedings of the international AAAI conference on web and social media, volume 16, pp. 896–907, 2022.
- “why is this misleading?”: Detecting news headline hallucinations with explanations. In Proceedings of the ACM Web Conference 2023, WWW ’23, pp. 1662–1672. Association for Computing Machinery, 2023.
- Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 3784–3803. Association for Computational Linguistics, November 2021.
- Ai model gpt-3 (dis)informs us better than humans. Science Advances, 9(26), 2023.
- Yanli Sun. Mining the correlation between human and automatic evaluation at sentence level. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), 2010.
- Facebook sentiment: Reactions and emojis. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 11–16. Association for Computational Linguistics, April 2017.
- TURINGBENCH: A benchmark environment for Turing test in the age of neural text generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2001–2016. Association for Computational Linguistics, November 2021.
- Does human collaboration enhance the accuracy of identifying llm-generated deepfake texts? Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 11(1):163–174, 2023.
- Can fighting misinformation have a negative spillover effect? how warnings for the threat of misinformation can decrease general news credibility. Journalism Studies, 24(6):803–823, 2023.
- I do not believe you: How providing a source corrects health misperceptions across social media platforms. Information, Communication & Society, 21(10):1337–1353, 2018.
- How to unring the bell: A meta-analytic approach to correction of misinformation. Communication monographs, 85(3):423–441, 2018.
- Networked narratives on humans of new york: A content analysis of social media engagement on facebook. Computers in human behavior, 66:149–153, 2017.
- Is rlhf more difficult than standard rl? a theoretical perspective. Advances in Neural Information Processing Systems, 36, 2024.
- Towards faithful neural table-to-text generation with content-matching constraints. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1072–1086. Association for Computational Linguistics, July 2020.
- Defending against neural fake news. Advances in neural information processing systems, 32, 2019.
- AlignScore: Evaluating factual consistency with a unified alignment function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 11328–11348, Toronto, Canada, July 2023. Association for Computational Linguistics.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
- Mahjabin Nahar (4 papers)
- Haeseung Seo (2 papers)
- Eun-Ju Lee (6 papers)
- Aiping Xiong (8 papers)
- Dongwon Lee (65 papers)