On the Decision-Making Abilities in Role-Playing using Large Language Models (2402.18807v1)
Abstract: LLMs are now increasingly utilized for role-playing tasks, especially in impersonating domain-specific experts, primarily through role-playing prompts. When interacting in real-world scenarios, the decision-making abilities of a role significantly shape its behavioral patterns. In this paper, we concentrate on evaluating the decision-making abilities of LLMs post role-playing thereby validating the efficacy of role-playing. Our goal is to provide metrics and guidance for enhancing the decision-making abilities of LLMs in role-playing tasks. Specifically, we first use LLMs to generate virtual role descriptions corresponding to the 16 personality types of Myers-Briggs Type Indicator (abbreviated as MBTI) representing a segmentation of the population. Then we design specific quantitative operations to evaluate the decision-making abilities of LLMs post role-playing from four aspects: adaptability, exploration$&$exploitation trade-off ability, reasoning ability, and safety. Finally, we analyze the association between the performance of decision-making and the corresponding MBTI types through GPT-4. Extensive experiments demonstrate stable differences in the four aspects of decision-making abilities across distinct roles, signifying a robust correlation between decision-making abilities and the roles emulated by LLMs. These results underscore that LLMs can effectively impersonate varied roles while embodying their genuine sociological characteristics.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Uncovering chatgpt’s capabilities in recommender systems. In Proceedings of the 17th ACM Conference on Recommender Systems.
- Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234.
- Reasoning, decision making and rationality. Cognition, 49(1-2):165–187.
- The dark triad of personality: A 10 year review. Social and personality psychology compass, 7(3):199–216.
- Samuel J Gershman. 2018. Deconstructing the human algorithms for exploration. Cognition, 173:34–42.
- Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
- Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702.
- Focussing in reasoning and decision making. Cognition, 49(1-2):37–66.
- Camel: Communicative agents for" mind" exploration of large scale language model society. arXiv preprint arXiv:2303.17760.
- Is gpt-3 a psychopath? evaluating large language models from a psychological perspective. arXiv preprint arXiv:2212.10529.
- Isabel Briggs Myers and Peter B Myers. 2010. Gifts differing: Understanding personality type. Nicholas Brealey.
- Zero-shot task generalization with multi-task deep reinforcement learning. In International Conference on Machine Learning, pages 2661–2670. PMLR.
- Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- In-context impersonation reveals large language models’ strengths and biases. arXiv preprint arXiv:2305.14930.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
- Hyperbandit: Contextual bandit with hypernewtork for time-varying user preferences in streaming recommendation. arXiv preprint arXiv:2308.08497.
- Towards out-of-distribution generalization: A survey. arXiv preprint arXiv:2108.13624.
- Dan Simon. 2001. Kalman filtering. Embedded systems programming, 14(6):72–79.
- Can chatgpt write a good boolean query for systematic review literature search? arXiv preprint arXiv:2302.03495.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Large language models are diverse role-players for summarization evaluation. arXiv preprint arXiv:2303.15078.
- Modeling user activity preference by leveraging user spatial temporal characteristics in lbsns. Journal of IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45:129–142.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
- Reinforcement learning: Exploration–exploitation dilemma in multi-agent foraging task. Opsearch, 49:223–236.
- Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Prompt consistency for zero-shot task generalization. arXiv preprint arXiv:2205.00049.
- Chenglei Shen (7 papers)
- Guofu Xie (4 papers)
- Xiao Zhang (435 papers)
- Jun Xu (397 papers)