Do Large Language Models Learn Human-Like Strategic Preferences? (2404.08710v2)
Abstract: In this paper, we evaluate whether LLMs learn to make human-like preference judgements in strategic scenarios as compared with known empirical results. Solar and Mistral are shown to exhibit stable value-based preference consistent with humans and exhibit human-like preference for cooperation in the prisoner's dilemma (including stake-size effect) and traveler's dilemma (including penalty-size effect). We establish a relationship between model size, value-based preference, and superficiality. Finally, results here show that models tending to be less brittle have relied on sliding window attention suggesting a potential link. Additionally, we contribute a novel method for constructing preference relations from arbitrary LLMs and support for a hypothesis regarding human behavior in the traveler's dilemma.
- Playing repeated games with large language models. arXiv preprint arXiv:2305.16867.
 - Identifying nontransitive preferences. Technical report, Working Paper.
 - Basu, K. 1994. The traveler’s dilemma: Paradoxes of rationality in game theory. The American Economic Review, 84(2): 391–395.
 - Experts Playing the Traveler’s Dilemma. 1–20.
 - Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
 - On the computational power of transformers and its implications in sequence modeling. arXiv preprint arXiv:2006.09286.
 - Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6): e2218523120.
 - Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16): 17960–17967.
 - Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, 1050–1059. PMLR.
 - Revealed-preference analysis with framing effects. Journal of Political Economy, 128(7): 2759–2795.
 - Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
 - Applied statistics for the behavioral sciences, volume 663. Houghton Mifflin Boston.
 - Phi-2: The surprising power of small language models. Microsoft Research Blog.
 - Mistral 7B. arXiv preprint arXiv:2310.06825.
 - Solar 10.7 b: Scaling large language models with simple yet effective depth up-scaling. arXiv preprint arXiv:2312.15166.
 - Characterising transitive two-sample tests. Statistics & Probability Letters, 109: 118–123.
 - Pythagorean fuzzy preference relations and their applications in group decision-making systems. International Journal of Intelligent Systems, 34(7): 1700–1717.
 - Noisy Channel Language Model Prompting for Few-Shot Text Classification. In Muresan, S.; Nakov, P.; and Villavicencio, A., eds., Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 5316–5330. Dublin, Ireland: Association for Computational Linguistics.
 - Do language models learn typicality judgments from text? arXiv preprint arXiv:2105.02987.
 - Individual and group behaviour in the traveler’s dilemma: An experimental study. Journal of Behavioral and Experimental Economics, 49: 1–7.
 - A Course in Game Theory., volume 63. ISBN 0262650401.
 - Nash, J. F.; et al. 1950. Non-cooperative games.
 - OpenAI. 2023. GPT-4 Technical Report. ArXiv, abs/2303.08774.
 - On the turing completeness of modern neural network architectures. arXiv preprint arXiv:1901.03429.
 - Roberts, J. 2021. Finding an Equilibrium in the Traveler’s Dilemma with Fuzzy Weak Domination. In 2021 IEEE Conference on Games (CoG), 1–5. IEEE.
 - Roberts, J. 2024. How Powerful are Decoder-Only Transformer Neural Models? arXiv:2305.17026.
 - Using Artificial Populations to Study Psychological Phenomena in Neural Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17): 18906–18914.
 - Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
 - Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5. arXiv preprint arXiv:2305.04400.
 - Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.
 - Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
 - Do Large Language Models know what humans know? Cognitive Science, 47(7): e13309.
 - Ullman, T. 2023. Large language models fail on trivial alterations to theory-of-mind tasks. arXiv preprint arXiv:2302.08399.
 - Attention is all you need. Advances in neural information processing systems, 30.
 - Avalon’s Game of Thoughts: Battle Against Deception through Recursive Contemplation. arXiv preprint arXiv:2310.01320.
 - Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
 - Moral bargain hunters purchase moral righteousness when it is cheap: within-individual effect of stake size in economic games. Scientific Reports, 6(1): 27824.
 - Agieval: A human-centric benchmark for evaluating foundation models. arXiv preprint arXiv:2304.06364.
 
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.