Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions? (2404.12138v2)
Abstract: Can LLMs simulate humans in making important decisions? Recent research has unveiled the potential of using LLMs to develop role-playing language agents (RPLAs), mimicking mainly the knowledge and tones of various characters. However, imitative decision-making necessitates a more nuanced understanding of personas. In this paper, we benchmark the ability of LLMs in persona-driven decision-making. Specifically, we investigate whether LLMs can predict characters' decisions provided by the preceding stories in high-quality novels. Leveraging character analyses written by literary experts, we construct a dataset LIFECHOICE comprising 1,462 characters' decision points from 388 books. Then, we conduct comprehensive experiments on LIFECHOICE, with various LLMs and RPLA methodologies. The results demonstrate that state-of-the-art LLMs exhibit promising capabilities in this task, yet substantial room for improvement remains. Hence, we further propose the CHARMAP method, which adopts persona-based memory retrieval and significantly advances RPLAs on this task, achieving 5.03% increase in accuracy.
- Anthropic. 2023. Model card and evaluations for claude models.
- " let your characters tell their story": A dataset for character-centric narrative understanding. arXiv preprint arXiv:2109.05438.
- Pearson correlation coefficient. Noise reduction in speech processing, pages 1–4.
- Martin Dodge and Rob Kitchin. 2007. ‘outlines of a world coming into existence’: pervasive computing and the ethics of forgetting. Environment and planning B: planning and design, 34(3):431–445.
- Lesley K Fellows. 2004. The cognitive neuroscience of human decision making: a review and conceptual framework. Behavioral and cognitive neuroscience reviews, 3(3):159–172.
- Retrieval-augmented generation for large language models: A survey.
- Lifelogging: Personal big data. Foundations and Trends® in information retrieval, 8(1):1–125.
- Matthew B Hoy. 2018. Alexa, siri, cortana, and more: an introduction to voice assistants. Medical reference services quarterly, 37(1):81–88.
- Facial emotion detection using deep learning. In 2020 international conference for emerging technology (INCET), pages 1–5. IEEE.
- Mixtral of experts.
- Andreas Kaplan and Michael Haenlein. 2019. Siri, siri, in my hand: Who’s the fairest in the land? on the interpretations, illustrations, and implications of artificial intelligence. Business horizons, 62(1):15–25.
- Chatharuhi: Reviving anime character in reality via large language model. arXiv preprint arXiv:2308.09597.
- Translate meanings, not just words: Idiomkb’s role in optimizing idiomatic translation with language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 18554–18563.
- Automated extraction of personal knowledge from smartphone push notifications. In 2018 IEEE International Conference on Big Data (Big Data), pages 733–742. IEEE.
- Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634.
- Deep learning-based document modeling for personality detection from text. IEEE Intelligent Systems, 32(2):74–79.
- A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849.
- Text and code embeddings by contrastive pre-training.
- OpenAI. 2022. Chatgpt.
- OpenAI. 2023. Gpt-4 technical report.
- Generative agents: Interactive simulacra of human behavior.
- The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
- Lamp: When large language models meet personalization.
- Tvshowguess: Character comprehension in stories as speaker guessing.
- Character-llm: A trainable agent for role-playing. arXiv preprint arXiv:2310.10158.
- Michael Stephen Silk. 2002. Aristophanes and the Definition of Comedy. Oxford University Press, USA.
- Alan Sommerstein. 2013. Aristophanes. The Encyclopedia of Ancient History.
- Sanja Štajner and Seren Yenikent. 2020. A survey of automatic personality detection from texts. In Proceedings of the 28th international conference on computational linguistics, pages 6284–6295.
- Akupm: Attention-enhanced knowledge-aware user preference model for recommendation. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1891–1899.
- Gemini Team. 2023. Gemini: A family of highly capable multimodal models.
- Llama 2: Open foundation and fine-tuned chat models.
- Voltaire. The Philosophy of History.
- Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews.
- Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models.
- Recursively summarizing books with human feedback.
- Cosplay: Concept set guided personalized dialogue generation across both party personas. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22. ACM.
- Few-shot character understanding in movies as an assessment to meta-learning of theory-of-mind. arXiv preprint arXiv:2211.04684.
- Emotion detection of textual data: An interdisciplinary survey. In 2021 IEEE World AI IoT Congress (AIIoT), pages 0255–0261. IEEE.
- Characterglm: Customizing chinese conversational ai characters with large language models. arXiv preprint arXiv:2311.16832.
- Rui Xu (198 papers)
- Xintao Wang (132 papers)
- Jiangjie Chen (46 papers)
- Siyu Yuan (46 papers)
- Xinfeng Yuan (6 papers)
- Jiaqing Liang (62 papers)
- Zulong Chen (19 papers)
- Xiaoqing Dong (2 papers)
- Yanghua Xiao (151 papers)