Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles (2502.09082v2)

Published 13 Feb 2025 in cs.CL and cs.AI

Abstract: Role-playing language agents (RPLAs) have emerged as promising applications of LLMs. However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as conversation setups, character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively.

Coordinating LLM-Based Persona Simulation of Established Roles: An Examination of CoSER

The research paper titled "CoSER: Coordinating LLM-Based Persona Simulation of Established Roles" explores the challenging domain of role-playing language agents (RPLAs), focusing on their capacity to simulate complex, established characters. As LLMs have evolved, they are increasingly employed to mimic personas with anthropomorphic qualities. While such applications are diverse, ranging from digital clones to video game characters, this paper addresses the nuanced task of simulating well-defined literary characters using an authentic, high-quality dataset dubbed CoSER.

Challenges in Role-Playing Language Agents

Two primary challenges are outlined within the paper: the scarcity of high-quality datasets and effective evaluation methods. Existing datasets frequently rely on synthesized dialogues, which may not accurately capture the intricate dynamics of character interactions. The authenticity and fidelity to the original texts are often compromised in these datasets. Additionally, evaluation methods are typically simplistic, often assessing response quality via predefined questions or LLM-based judging, which can suffer from bias and possibly fail to capture the nuanced aspects of character portrayal.

The CoSER Dataset

To address these challenges, the researchers introduce the CoSER dataset, derived from 771 renowned books. This dataset differentiates itself through two notable aspects. It contains authentic, multi-character dialogues extracted directly from literary works, maintaining fidelity to source material. Furthermore, it encompasses a comprehensive range of data types, including plot summaries, character experiences, and conversation backgrounds. This diverse data supports various applications such as prompting, retrieval, model training, and evaluation.

Given-Circumstance Acting (GCA)

The paper proposes a novel training and evaluation approach termed "given-circumstance acting" (GCA). This paradigm requires an actor LLM to sequentially inhabit and portray each character within a given narrative scenario. For training, the CoSER dataset is utilized to teach LLMs to authentically reproduce a character's utterances and mindset. This approach led to the development of CoSER 8B and 70B models, which are based on LLaMA-3.1 and demonstrate superior character portrayal capabilities across various benchmarks. Evaluation under GCA involves multi-agent simulation, using a penalty-based multiple rubric to measure different aspects of character portrayal, including anthropomorphism, character fidelity, and storyline consistency.

Experimental Results

Through extensive experiments, CoSER models have shown state-of-the-art performance in role-playing tasks across four benchmarks. The 70B variant, particularly, displayed high alignment with character nuances and multi-turn dialogue simulations, emphasizing the dataset's value in training effective RPLAs. When compared against traditional and contemporary models, CoSER 70B and 8B displayed significant improvements in terms of both BLEU and ROUGE-L scores, validating their superior capability in generating character-consistent dialogues.

Implications and Future Developments

The implications of this research extend into both practical applications and theoretical advancements within AI. The creation and utilization of datasets like CoSER represent a meaningful progression toward developing AI that can engage in complex, human-like interactions grounded in character authenticity. Future developments in this area may involve enhancing the interpretative capabilities of LLMs to better understand and simulate characters' internal thoughts and intentions. Additionally, the principles of given-circumstance acting could be leveraged to improve AI performance in fields requiring nuanced human-computer interactions, such as customer service and interactive storytelling.

In summary, the CoSER framework presents an innovative approach for simulating established character roles within LLMs, setting an advanced platform for future AI applications that demand intricate character portrayals. The research underscores the importance of authentic datasets and robust evaluation methodologies in advancing the field of role-playing LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Xintao Wang (132 papers)
  2. Heng Wang (136 papers)
  3. Yifei Zhang (167 papers)
  4. Xinfeng Yuan (6 papers)
  5. Rui Xu (198 papers)
  6. Jen-tse Huang (46 papers)
  7. Siyu Yuan (46 papers)
  8. Haoran Guo (12 papers)
  9. Jiangjie Chen (46 papers)
  10. Wei Wang (1793 papers)
  11. Yanghua Xiao (151 papers)
  12. Shuchang Zhou (51 papers)