Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Decision-Making Abilities in Role-Playing using Large Language Models (2402.18807v1)

Published 29 Feb 2024 in cs.CL and cs.AI

Abstract: LLMs are now increasingly utilized for role-playing tasks, especially in impersonating domain-specific experts, primarily through role-playing prompts. When interacting in real-world scenarios, the decision-making abilities of a role significantly shape its behavioral patterns. In this paper, we concentrate on evaluating the decision-making abilities of LLMs post role-playing thereby validating the efficacy of role-playing. Our goal is to provide metrics and guidance for enhancing the decision-making abilities of LLMs in role-playing tasks. Specifically, we first use LLMs to generate virtual role descriptions corresponding to the 16 personality types of Myers-Briggs Type Indicator (abbreviated as MBTI) representing a segmentation of the population. Then we design specific quantitative operations to evaluate the decision-making abilities of LLMs post role-playing from four aspects: adaptability, exploration$&$exploitation trade-off ability, reasoning ability, and safety. Finally, we analyze the association between the performance of decision-making and the corresponding MBTI types through GPT-4. Extensive experiments demonstrate stable differences in the four aspects of decision-making abilities across distinct roles, signifying a robust correlation between decision-making abilities and the roles emulated by LLMs. These results underscore that LLMs can effectively impersonate varied roles while embodying their genuine sociological characteristics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
  2. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  4. Uncovering chatgpt’s capabilities in recommender systems. In Proceedings of the 17th ACM Conference on Recommender Systems.
  5. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335.
  6. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  7. Reasoning, decision making and rationality. Cognition, 49(1-2):165–187.
  8. The dark triad of personality: A 10 year review. Social and personality psychology compass, 7(3):199–216.
  9. Samuel J Gershman. 2018. Deconstructing the human algorithms for exploration. Cognition, 173:34–42.
  10. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992.
  11. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
  12. Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702.
  13. Focussing in reasoning and decision making. Cognition, 49(1-2):37–66.
  14. Camel: Communicative agents for" mind" exploration of large scale language model society. arXiv preprint arXiv:2303.17760.
  15. Is gpt-3 a psychopath? evaluating large language models from a psychological perspective. arXiv preprint arXiv:2212.10529.
  16. Isabel Briggs Myers and Peter B Myers. 2010. Gifts differing: Understanding personality type. Nicholas Brealey.
  17. Zero-shot task generalization with multi-task deep reinforcement learning. In International Conference on Machine Learning, pages 2661–2670. PMLR.
  18. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442.
  19. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  20. In-context impersonation reveals large language models’ strengths and biases. arXiv preprint arXiv:2305.14930.
  21. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  22. Hyperbandit: Contextual bandit with hypernewtork for time-varying user preferences in streaming recommendation. arXiv preprint arXiv:2308.08497.
  23. Towards out-of-distribution generalization: A survey. arXiv preprint arXiv:2108.13624.
  24. Dan Simon. 2001. Kalman filtering. Embedded systems programming, 14(6):72–79.
  25. Can chatgpt write a good boolean query for systematic review literature search? arXiv preprint arXiv:2302.03495.
  26. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  27. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  28. Large language models are diverse role-players for summarization evaluation. arXiv preprint arXiv:2303.15078.
  29. Modeling user activity preference by leveraging user spatial temporal characteristics in lbsns. Journal of IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45:129–142.
  30. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  31. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
  32. Reinforcement learning: Exploration–exploitation dilemma in multi-agent foraging task. Opsearch, 49:223–236.
  33. Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830.
  34. A survey of large language models. arXiv preprint arXiv:2303.18223.
  35. Prompt consistency for zero-shot task generalization. arXiv preprint arXiv:2205.00049.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Chenglei Shen (7 papers)
  2. Guofu Xie (4 papers)
  3. Xiao Zhang (435 papers)
  4. Jun Xu (397 papers)
Citations (1)