AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment (2409.16022v2)

Published 24 Sep 2024 in cs.CL and cs.AI

Abstract: Cognitive biases are systematic deviations in thinking that lead to irrational judgments and problematic decision-making, extensively studied across various fields. Recently, LLMs have shown advanced understanding capabilities but may inherit human biases from their training data. While social biases in LLMs have been well-studied, cognitive biases have received less attention, with existing research focusing on specific scenarios. The broader impact of cognitive biases on LLMs in various decision-making contexts remains underexplored. We investigated whether LLMs are influenced by the threshold priming effect in relevance judgments, a core task and widely-discussed research topic in the Information Retrieval (IR) coummunity. The priming effect occurs when exposure to certain stimuli unconsciously affects subsequent behavior and decisions. Our experiment employed 10 topics from the TREC 2019 Deep Learning passage track collection, and tested AI judgments under different document relevance scores, batch lengths, and LLM models, including GPT-3.5, GPT-4, LLaMa2-13B and LLaMa2-70B. Results showed that LLMs tend to give lower scores to later documents if earlier ones have high relevance, and vice versa, regardless of the combination and model used. Our finding demonstrates that LLM%u2019s judgments, similar to human judgments, are also influenced by threshold priming biases, and suggests that researchers and system engineers should take into account potential human-like cognitive biases in designing, evaluating, and auditing LLMs in IR tasks and beyond.

PDF HTML Abstract

An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment

In the field of Information Retrieval (IR), the understanding of human cognitive biases has been pivotal in improving the design and evaluation of search systems. Extending this concept to AI, the paper "AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment" investigates whether LLMs inherit cognitive biases, specifically the threshold priming effect, akin to humans during batch relevance assessments. The authors, Nuo Chen et al., employed a systematic approach to examine the presence and impact of this bias across multiple LLM platforms, including GPT-3.5, GPT-4, LLaMa2-13B, and LLaMa2-70B.

Methodology and Experiment Design

The authors conducted experiments utilizing 10 topics from the TREC 2019 Deep Learning passage track collection, ensuring a diverse range of domains and relevance levels. Each LLM's relevance judgment was tested under various criteria: different document relevance scores, batch lengths, and the type of LLM used. The experiment setup included creating high threshold (HT) and low threshold (LT) prologues, and subsequently measuring the relevance assessment of identical epilogue documents by the LLMs. This careful design isolated the threshold priming effect by comparing the scores assigned to epilogue documents under HT and LT conditions.

Key Findings

The empirical results demonstrated that LLMs are indeed susceptible to threshold priming:

Influence of Prologue Length: When both the prologue and epilogue lengths were short (PL = 4, EL = 4), all tested models exhibited significant threshold priming. Specifically, earlier higher relevance scores in the prologue led to lower relevance assessments in subsequent documents.
Model-specific Observations: GPT-3.5 and GPT-4 showed pronounced threshold priming effects across most conditions. In contrast, LLaMa2-70B exhibited threshold priming primarily in configurations with shorter prologue lengths but exhibited an inversion of this effect under longer prologue lengths (PL = 8).
Topic Sensitivity: The extent of threshold priming varied across different topics, indicating that certain queries are more prone to biased judgment by LLMs. Particularly, topics with less significant differences between LT and HT conditions hinted at other cognitive biases, such as the anchoring effect, influencing the models’ judgments.

Implications and Future Directions

The implications of these findings are both practical and theoretical. Firstly, they highlight that despite advanced capabilities, LLMs are not immune to cognitive biases. The awareness of such biases mandates that IR system designers and evaluators incorporate countermeasures against biases during algorithm training and evaluation. Additionally, these biases in AI could compound when these models interact with human users, potentially leading to an amplification of errors and misjudgments.

Theoretically, this paper underscores the necessity for a deeper investigation into the "bounded rationality" of LLMs. Borrowing from economic and cognitive theory, the AI research community needs to draw parallels with human cognitive limitations to better understand and mitigate these biases in machine learning models. Future research should focus on:

Broader Dataset Evaluation: Testing a wider range of topics and scenarios to confirm the generalizability of these findings.
Prompt Engineering: Exploring prompt modifications and structured queries to minimize the impact of cognitive biases.
Bias Mitigation Techniques: Developing and integrating anti-bias protocols in training regimes of LLMs.
Interdisciplinary Approaches: Leveraging psychological and behavioral insights to inform AI development, ensuring better aligned AI-augmented decision systems.

Conclusion

This exploratory paper initiates a crucial dialogue on the presence of human-like cognitive biases in LLMs, specifically within the context of threshold priming. As LLMs increasingly influence decision-making processes, understanding and addressing such biases become imperative. By revealing these biases, the research paves the way for developing more robust, unbiased AI systems that can fairly and effectively augment human judgments in diverse applications.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Nuo Chen (100 papers)
Jiqun Liu (15 papers)
Xiaoyu Dong (23 papers)
Qijiong Liu (22 papers)
Tetsuya Sakai (30 papers)
Xiao-Ming Wu (91 papers)

Citations (4)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/JiqunL/status/1839152764783579274

YouTube

Show All Videos