Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring the Sensitivity of LLMs' Decision-Making Capabilities: Insights from Prompt Variation and Hyperparameters (2312.17476v1)

Published 29 Dec 2023 in cs.CL

Abstract: The advancement of LLMs has led to their widespread use across a broad spectrum of tasks including decision making. Prior studies have compared the decision making abilities of LLMs with those of humans from a psychological perspective. However, these studies have not always properly accounted for the sensitivity of LLMs' behavior to hyperparameters and variations in the prompt. In this study, we examine LLMs' performance on the Horizon decision making task studied by Binz and Schulz (2023) analyzing how LLMs respond to variations in prompts and hyperparameters. By experimenting on three OpenAI LLMs possessing different capabilities, we observe that the decision making abilities fluctuate based on the input prompts and temperature settings. Contrary to previous findings LLMs display a human-like exploration exploitation tradeoff after simple adjustments to the prompt.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Cognitive network science reveals bias in gpt-3, chatgpt, and gpt-4 mirroring math anxiety in high-school students. arXiv preprint arXiv:2305.18320.
  2. Marcel Binz and Eric Schulz. 2023. Using cognitive psychology to understand gpt-3. Proceedings of the National Academy of Sciences, 120(6):e2218523120.
  3. Personality testing of gpt-3: Limited temporal reliability, but highlighted social desirability of gpt-3’s personality instruments results. arXiv preprint arXiv:2306.04308.
  4. Samuel R Bowman. 2023. Eight things to know about large language models. arXiv preprint arXiv:2304.00612.
  5. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901.
  6. Does ChatGPT resemble humans in language use? arXiv preprint arXiv:2303.08014.
  7. Edward Cartwright. 2018. Behavioral Economics. Routledge.
  8. Social companionship with artificial intelligence: Recent trends and future avenues. Technological Forecasting and Social Change, 193:122634.
  9. Training verifiers to solve math word problems.
  10. Neural language models as psycholinguistic subjects: Representations of syntactic state. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 32–42, Minneapolis, Minnesota. Association for Computational Linguistics.
  11. John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? arXiv preprint arXiv:2301.07543.
  12. MathPrompter: Mathematical reasoning using large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 37–42, Toronto, Canada. Association for Computational Linguistics.
  13. Assessing the ability of lstms to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4:521–535.
  14. Who is gpt-3? an exploration of personality, values and demographics. arXiv preprint arXiv:2209.14338.
  15. Iqbal Munir et al. 2023. Artificial intelligence chatgpt in medicine. can it be the friend you are looking for? Journal of Bangladesh Medical Association of North America (BMANA) BMANA Journal, pages 01–04.
  16. Llm is like a box of chocolates: the non-determinism of chatgpt in code generation.
  17. Steve Phelps and Yvan I Russell. 2023. Investigating emergent goal-like behaviour in large language models using experimental economics. arXiv preprint arXiv:2305.07970.
  18. Impact of pretraining term frequencies on few-shot numerical reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 840–854, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  19. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
  20. Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.
  21. Self-consistency improves chain of thought reasoning in language models.
  22. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  23. Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General, 143(6):2074.
  24. Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Manikanta Loya (2 papers)
  2. Divya Anand Sinha (2 papers)
  3. Richard Futrell (29 papers)
Citations (21)