Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias (2308.00225v2)

Published 1 Aug 2023 in cs.AI, cs.CY, and cs.LG

Abstract: Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of LLMs (LMs) dramatically. While these tuning methods can help align models with human objectives and generate high-quality text, not much is known about their potential adverse effects. In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs, focusing on three cognitive biases - the decoy effect, the certainty effect, and the belief bias - all of which are known to influence human decision-making and reasoning. Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families. Notably, we find a stronger presence of biases in models that have undergone instruction tuning, such as Flan-T5, Mistral-Instruct, GPT3.5, and GPT4. Our work constitutes a step toward comprehending cognitive biases in instruction-tuned LMs, which is crucial for the development of more reliable and unbiased LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Cognitive biases and decision-making strategies in times of change: a systematic literature review. Management Decision, 59(3):638–652.
  2. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  3. Vincent Berthet. 2022. The impact of cognitive biases on professionals’ decision-making: A review of four occupational areas. Frontiers in Psychology, 12:802439.
  4. Marcel Binz and Eric Schulz. 2022. Using cognitive psychology to understand gpt-3. ArXiv, abs/2206.14576.
  5. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  6. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  7. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  8. Language models show human-like content effects on reasoning. arXiv preprint arXiv:2207.07051.
  9. Justin B Dimick and Andrew M Ryan. 2014. Methods for evaluating changes in health care policy: the difference-in-differences approach. Jama, 312(22):2401–2402.
  10. On the conflict between logic and belief in syllogistic reasoning. Memory & cognition, 11(3):295–306.
  11. Milton Friedman and Leonard J Savage. 1948. The utility analysis of choices involving risk. Journal of political Economy, 56(4):279–304.
  12. Hila Gonen and Yoav Goldberg. 2019. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862.
  13. Machine intuition: Uncovering human-like intuitive decision-making in gpt-3.5. arXiv preprint arXiv:2212.05206.
  14. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874.
  15. Surface form competition: Why the highest probability answer isn’t always right. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7038–7051, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  16. Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of consumer research, 9(1):90–98.
  17. Mistral 7b. arXiv preprint arXiv:2310.06825.
  18. Daniel Kahneman. 1979. Prospect theory: An analysis of decisions under risk. Econometrica, 47:278.
  19. The unlocking spell on base llms: Rethinking alignment via in-context learning. arXiv preprint arXiv:2312.01552.
  20. D McFadden. 1974. Conditional logit analysis of qualitative choice behavior. Frontiers in Econometrics.
  21. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
  22. OpenAI. 2023. Gpt-4 technical report.
  23. Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155.
  24. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  25. Vered Shwartz and Yejin Choi. 2020. Do neural language models overcome reporting bias? In Proceedings of the 28th International Conference on Computational Linguistics, pages 6863–6870, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  26. Robin Smith. 2022. Aristotle’s Logic. In Edward N. Zalta and Uri Nodelman, editors, The Stanford Encyclopedia of Philosophy, Winter 2022 edition. Metaphysics Research Lab, Stanford University.
  27. Student. 1908. The probable error of a mean. Biometrika, 6(1):1–25.
  28. Fewer errors, but more stereotypes? the effect of model size on gender bias. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 112–120, Seattle, Washington. Association for Computational Linguistics.
  29. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419.
  30. Are large language models rational investors?
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Itay Itzhak (4 papers)
  2. Gabriel Stanovsky (61 papers)
  3. Nir Rosenfeld (28 papers)
  4. Yonatan Belinkov (111 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.