The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges (2411.15396v1)

Published 23 Nov 2024 in cs.IR, cs.AI, and cs.HC

Abstract: Can AI be cognitively biased in automated information judgment tasks? Despite recent progresses in measuring and mitigating social and algorithmic biases in AI and LLMs, it is not clear to what extent LLMs behave "rationally", or if they are also vulnerable to human cognitive bias triggers. To address this open problem, our study, consisting of a crowdsourcing user experiment and a LLM-enabled simulation experiment, compared the credibility assessments by LLM and human judges under potential decoy effects in an information retrieval (IR) setting, and empirically examined the extent to which LLMs are cognitively biased in COVID-19 medical (mis)information assessment tasks compared to traditional human assessors as a baseline. The results, collected from a between-subject user experiment and a LLM-enabled replicate experiment, demonstrate that 1) Larger and more recent LLMs tend to show a higher level of consistency and accuracy in distinguishing credible information from misinformation. However, they are more likely to give higher ratings for misinformation due to the presence of a more salient, decoy misinformation result; 2) While decoy effect occurred in both human and LLM assessments, the effect is more prevalent across different conditions and topics in LLM judgments compared to human credibility ratings. In contrast to the generally assumed "rationality" of AI tools, our study empirically confirms the cognitive bias risks embedded in LLM agents, evaluates the decoy impact on LLMs against human credibility assessments, and thereby highlights the complexity and importance of debiasing AI agents and developing psychology-informed AI audit techniques and policies for automated judgment tasks and beyond.

Summary

The paper demonstrates that advanced LLMs, despite overall strong performance, are vulnerable to decoy bias especially in multi-query contexts.
It employs a dual experimental design comparing human and AI assessments to quantify the influence of the decoy effect on credibility judgments.
The study highlights topic-specific decoy vulnerabilities and the need for robust debiasing strategies for safe deployment in healthcare.

Overview of "The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges"

The paper "The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges" probes into a crucial facet of AI judgment: the susceptibility of LLMs to cognitive biases, specifically the decoy effect, when assessing the credibility of online medical information. As LLMs become integral to information processing tasks traditionally handled by humans, this research addresses a pertinent question: Are LLMs purely rational, or do they mirror human-like biases?

Key Findings

The paper employs a dual experimental design involving both human and AI participants to gauge the impact of decoy effects on credibility assessments regarding COVID-19 medical information. The decoy effect refers to the cognitive bias where the presence of an inferior option can shape preferences among existing choices. Contrary to the assumption that AI systems operate without bias, the findings indicate that LLMs, particularly newer and more sophisticated models, can indeed exhibit bias when exposed to decoy triggers.

Credibility Judgment Consistency:
- Larger and more recent LLMs, like GPT-4O and Claude-3-Sonnet, generally outperform older models in distinguishing credible information from misinformation. These models, however, display increased susceptibility to decoy effects compared to human judges, particularly in settings that involve multi-query interactions.
Session Context Influence:
- In contexts where previous queries influence current judgments (multi-query sessions), decoy effects are more pronounced in LLM assessments. This highlights the importance of contextual memory in amplifying cognitive biases within AI systems.
Topic-Specific Vulnerabilities:
- The presence and strength of decoy effects vary by subject matter, implying that certain topics may inherently trigger more pronounced biases in LLMs.

Implications and Future Research

The empirical confirmation of cognitive biases in AI systems, as illustrated by the decoy effects observed in LLM judgments, raises significant concerns about the deployment of these models in high-stakes decision environments, such as healthcare. It emphasizes the necessity for rigorous AI auditing and debiasing strategies. Developing methods to detect and mitigate such biases is critical for ensuring that AI can assist in human decision-making without perpetuating misinformation.

Future research should explore:

Broader applications of decoy effect evaluations across various domains to understand the generalizability of these findings.
Development of sophisticated models that incorporate debiasing mechanisms at both pre-processing (training data refinement) and processing (real-time bias detection and correction) stages.
Exploration of other cognitive biases and their potential influence on AI judgment tasks, extending beyond the decoy effect to include biases like confirmation bias and anchoring.

In conclusion, this research underscores the complexity of AI-driven evaluation and decision-support systems and the pressing need for frameworks that proactively address cognitive biases, thereby enhancing the reliability and trustworthiness of AI in information-intensive domains.