Cognitive Bias in Decision-Making with LLMs (2403.00811v3)
Abstract: LLMs offer significant potential as tools to support an expanding range of decision-making tasks. Given their training on human (created) data, LLMs have been shown to inherit societal biases against protected groups, as well as be subject to bias functionally resembling cognitive bias. Human-like bias can impede fair and explainable decisions made with LLM assistance. Our work introduces BiasBuster, a framework designed to uncover, evaluate, and mitigate cognitive bias in LLMs, particularly in high-stakes decision-making tasks. Inspired by prior research in psychology and cognitive science, we develop a dataset containing 13,465 prompts to evaluate LLM decisions on different cognitive biases (e.g., prompt-induced, sequential, inherent). We test various bias mitigation strategies, while proposing a novel method utilizing LLMs to debias their own human-like cognitive bias within prompts. Our analysis provides a comprehensive picture of the presence and effects of cognitive bias across commercial and open-source models. We demonstrate that our selfhelp debiasing effectively mitigates model answers that display patterns akin to human cognitive bias without having to manually craft examples for each bias.
- Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 298–306.
- Despite" super-human" performance, current llms are unsuited for decisions about ethics and safety. arXiv preprint arXiv:2212.06295.
- How cognitive biases affect xai-assisted decision-making: A systematic review. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, pages 78–91.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.
- Ai-moderated decision-making: Capturing and balancing anchoring bias in sequential decision tasks. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–9.
- A two-process account of long-term serial position effects. Journal of Experimental Psychology: Human Learning and Memory, 6(4):355.
- Bias beyond English: Counterfactual tests for bias in sentiment analysis in four languages. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4458–4468, Toronto, Canada. Association for Computational Linguistics.
- David L Hamilton and Robert K Gifford. 1976. Illusory correlation in interpersonal perception: A cognitive basis of stereotypic judgments. Journal of Experimental Social Psychology, 12(4):392–407.
- The evolution of cognitive bias. The handbook of evolutionary psychology, pages 724–746.
- Detect and perturb: Neutral rewriting of biased and sensitive text via gradient-based decoding. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4173–4181, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Reducing sentiment bias in language models via counterfactual evaluation. arXiv preprint arXiv:1911.03064.
- MathPrompter: Mathematical reasoning using large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 37–42, Toronto, Canada. Association for Computational Linguistics.
- Instructed to bias: Instruction-tuned language models exhibit emergent cognitive bias. arXiv preprint arXiv:2308.00225.
- Erik Jones and Jacob Steinhardt. 2022. Capturing failures of large language models via human cognitive biases. Advances in Neural Information Processing Systems, 35:11785–11799.
- Judgment under uncertainty: Heuristics and biases. Cambridge university press.
- Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models. Advances in neural information processing systems, 34:2611–2624.
- Benchmarking cognitive biases in large language models as evaluators. arXiv preprint arXiv:2309.17012.
- Gender bias and stereotypes in large language models. arXiv preprint arXiv:2308.14921.
- Prompted LLMs as chatbot modules for long open-domain conversation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4536–4554, Toronto, Canada. Association for Computational Linguistics.
- Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems, 35:31199–31212.
- Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR.
- Ruixi Lin and Hwee Tou Ng. 2023. Mind the biases: Quantifying cognitive biases in language model prompting. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5269–5281.
- Debiasing through raising awareness reduces the anchoring bias. -.
- Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
- Pipelines for social bias testing of large language models. In Proceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models. Association for Computational Linguistics.
- Bias in word embeddings. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 446–457.
- Supporting human-ai collaboration in auditing llms with llms. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 913–926.
- Beyond accuracy: Behavioral testing of nlp models with checklist. arXiv preprint arXiv:2005.04118.
- A survey of evaluation metrics used for nlg systems. ACM Computing Surveys (CSUR), 55(2):1–39.
- William Samuelson and Richard Zeckhauser. 1988. Status quo bias in decision making. Journal of risk and uncertainty, 1:7–59.
- Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence, 4(3):258–268.
- Counterfactually augmented data and unintended bias: The case of sexism and hate speech detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4716–4726, Seattle, United States. Association for Computational Linguistics.
- Alaina N Talboy and Elizabeth Fuller. 2023. Challenging the appearance of machine intelligence: Cognitive bias in llms. arXiv preprint arXiv:2304.01358.
- Evaluating the factual consistency of large language models through news summarization. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5220–5255, Toronto, Canada. Association for Computational Linguistics.
- Jiulin Teng. 2013. Bias dilemma: de-biasing and the consequent introduction of new biases. HEC Paris Research Paper No. SPE-2013-1025.
- Amos Tversky and Daniel Kahneman. 1981. The framing of decisions and the psychology of choice. science, 211(4481):453–458.
- A study of implicit bias in pretrained language models against people with disabilities. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1324–1332.
- Investigating gender bias in language models using causal mediation analysis. Advances in neural information processing systems, 33:12388–12401.
- Hrishikesh Viswanath and Tianyi Zhang. 2023. Fairpy: A toolkit for evaluation of social biases and their mitigation in large language models. arXiv preprint arXiv:2302.05508.
- Zero-shot cross-lingual summarization via large language models. In Proceedings of the 4th New Frontiers in Summarization Workshop, pages 12–23.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Efficacy of bias awareness in debiasing oil and gas judgments. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 29.
- Examining inter-consistency of large language models collaboration: An in-depth analysis via debate. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7572–7590, Singapore. Association for Computational Linguistics.
- Double perturbation: On the robustness of robustness and counterfactual bias evaluation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3899–3916, Online. Association for Computational Linguistics.
- Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876.
- Jessica Echterhoff (5 papers)
- Yao Liu (116 papers)
- Abeer Alessa (2 papers)
- Julian McAuley (238 papers)
- Zexue He (23 papers)