Papers
Topics
Authors
Recent
Search
2000 character limit reached

AdaRPSet: Adaptive Role-Playing Dataset

Updated 23 January 2026
  • AdaRPSet is a structured dataset containing over 22,000 role-playing trajectories, annotated using an immersive message format for dynamic narratives.
  • It integrates both literature-derived and LLM-synthesized trajectories, enabling robust scene management and adaptive character interactions.
  • The dataset supports advanced training objectives with metrics for environmental grounding and character consistency, enhancing adaptive role-playing research.

AdaRPSet is a large-scale, structured dataset constructed to train and evaluate LLM "Actor Models" specifically for immersive multi-agent role-playing in dynamic narrative environments. It serves as a principal data component in the AdaMARP framework, which addresses limitations in previous LLM-based role-playing systems, such as poor environmental grounding, limited scene/cast adaptability, and insufficient support for orchestrating dynamic character interactions. AdaRPSet is meticulously annotated to reflect complex, multi-dimensional conversational trajectories within both human-authored and LLM-synthesized interactive narratives, using a unified, immersion-oriented message format that encodes internal thought, explicit action, environmental context, and speech (Xu et al., 16 Jan 2026).

1. Dataset Composition and Statistical Structure

AdaRPSet comprises 22 425 distinct role-playing trajectories, partitioned into two categories:

  • AdaRPSet-Extracted: 12 525 trajectories derived from 81 public-domain and best-seller books, with plot extraction centered on chapter boundaries (~8 k tokens) and assisted by LLM-based scene segmentation and character profiling.
  • AdaRPSet-Synthesis: 9 900 trajectories generated by LLM prompt synthesis, distributed evenly over 20 themes (495 per theme; pi=0.05p_i = 0.05 per theme), including but not limited to Adventure, Quest, Rescue, Mystery, Betrayal, Magic, and Apocalypse.

These trajectories are further segmented into 14 343 narrative scenes (NsceneN_{\text{scene}}), spanning a total of 450 235 utterances (agent turns) and yielding an average of 20.08 turns per trajectory.

Control-message injection, used for training orchestration decisions with AdaSMSet, produces a dense annotation of discrete management actions within the AdaRPSet-Synthesis split:

Manager Action Total Count Mean per Traj.
init_scene 9 900 1.00
pick_speaker 223 415 22.57
switch_scene 10 101 1.02
add_role 9 862 1.00
end 9 900 1.00

This distribution encodes continuous scene management, turn-taking, and adaptive character introduction.

2. Annotation Schema and Message Encoding

Every utterance in AdaRPSet is formatted using a unified "immersive" schema designed to reflect narrative depth and agent situatedness:

  • [Thought]: First-person, internal monologue, inaccessible to other characters.
  • (Action): Agent actions or behaviors visible to others within the scene.
  • <Environment>: Exogenous events, ambient phenomena, or environmental cues with no explicit agent.
  • Speech: Direct character dialogue (unadorned).

Example (in structured LaTeX-style markup):

[I can’t believe she asked that]Thought(glances toward the door)Action<a distant thunder rumbles>EnvironmentI’m not sure we should stay here.\underbrace{[\text{I can’t believe she asked that}]}_\text{Thought} \underbrace{(\text{glances toward the door})}_\text{Action} \underbrace{<\text{a distant thunder rumbles}>}_\text{Environment} \text{I’m not sure we should stay here.}

The combined message format enforces explicit, granular modeling of agent cognition, performative acts, environmental context, and dialogue, with consistent wrappers across all source and synthetic splits.

3. Training Objectives and Evaluation Metrics

Actor Models trained on AdaRPSet are optimized using a composite loss function:

L=LCE+λenvLenv+λconsLcons\mathcal{L} = \mathcal{L}_{CE} + \lambda_{env}\,\mathcal{L}_{env} + \lambda_{cons}\,\mathcal{L}_{cons}

  • Cross-Entropy LCE\mathcal{L}_{CE} is calculated over next-token prediction.
  • Environment Grounding Lenv\mathcal{L}_{env} directly penalizes incorrect generation of <Environment> tags:

Lenv=tIenvlogpθ(etcontextt)\mathcal{L}_{env} = -\sum_{t\in \mathcal{I}_{env}} \log p_\theta(e_t \mid \text{context}_t)

where Ienv\mathcal{I}_{env} marks environment-token positions.

  • Consistency Regularizer Lcons\mathcal{L}_{cons} (optional): Encourages persona fidelity via squared distance between learned trajectory representations and prescribed profile vectors:

Lcons=τbatchfpersona(τ)p2\mathcal{L}_{cons} = \sum_{\tau\in \text{batch}} \lVert f_{persona}(\tau) - p^*\rVert^2

Trajectory-level evaluation is performed via AdaptiveBench, deploying an LLM judge to score simulated trajectories on five dimensions using 0–10 sub-metrics. For example, Character Consistency (CC) is scored as:

CC=15d=15sd\mathrm{CC} = \frac{1}{5}\sum_{d=1}^{5} s_d

with sds_d assigned to Internal Coherence, Speaking-Style Fidelity, Language Fluency, Profile Fidelity, and Motivation Stability. Environmental Grounding (EG) is similarly:

EG=12(sAwareness+sUtilization)\mathrm{EG} = \frac{1}{2}(s_{\mathrm{Awareness}} + s_{\mathrm{Utilization}})

Other axes include Interpersonal Interaction and Narrative Progression. Aggregated behavioral validity can employ:

ConsistencyScore=1τt=1τ1[actor_behavior_valid(mt)]\mathrm{ConsistencyScore} = \frac{1}{|\tau|}\sum_{t=1}^{|\tau|} \mathbf{1}[\text{actor\_behavior\_valid}(m_t)]

4. Data Collection and Quality Assurance

AdaRPSet-Extracted utilizes LLMs (GPT-5-Chat) to assist plot extraction and segmentation from literature, generating 7-dimensional character profiles for each extracted persona. Chapters are chunked at ~8 k tokens, and scenes are cast as immersive-format dialogues.

AdaRPSet-Synthesis employs 20 distinct thematic prompts, each generating 495 trajectories formulated to integrate core actions (init_scene, [Thought], (Action), <Environment>, Speech, at least one switch_scene and one add_role per trajectory).

Quality control measures include:

  • Automated duplicate detection and culling (5\sim 5\% duplicate removal) based on main-character names.
  • Manual review per synthetic theme to enforce scenario uniqueness and correct control action insertion.
  • Spot checks for format and procedural compliance; there is no multi-annotator scoring on the raw data.

5. Usage Recommendations, Integration, and Pitfalls

For optimal model performance and generalization:

  • Combine AdaRPSet-Extracted and AdaRPSet-Synthesis in equal measure to balance stylistic realism and the modeling of dynamic events.
  • Use the unified message-format wrapper for all training samples.
  • Representative training hyperparameters (for 7 B/8 B models): micro_batch_size=24, global_batch_size=48, AdamW optimizer, LR=1e–6 with 5% warmup from 1e–7, 8 epochs, max sequence length 16k tokens (left-truncated).

Potential pitfalls include:

  • Exclusive training on Extracted data yields strong format compliance but reduces exposure to rapid role/scene adaptation.
  • Exclusive use of Synthesis data limits dialogue style diversity and narrative depth.
  • Overly constraining instruction prompts may suppress actor flexibility; basic prompts tend to produce superior AdaptiveBench scores.

6. Representative Trajectory Example

An exemplary AdaRPSet-Synthesis “Adventure” trajectory (abridged) exhibits the following phenomena:

$\begin{array}{l} \text{Scene Manager: init\_scene}\quad \text{Late afternoon onboard the Orphan Gale, steam hisses under brass fittings, clouds glow below.}\[4pt] \text{Isolde Ferrowind: }<\text{The compass trembles}> [Winds shift too fast] \text{Keep sharp, Taron.}\[4pt] \text{Taron Corvith: }(\text{sketches wind currents rapidly}) \text{The currents twist like braided rivers...}\[4pt] \vdots\[4pt] \text{Valdrex: }(\text{emerging from below deck}) \text{Hull’s holding, Captain, but I hear foreign engines.}\[4pt] \text{Isolde Ferrowind: }[Competition or ambush] \text{Mark bearing forty-two north by west.}\[4pt] \text{Scene Manager: add\_role}\quad \text{reason: A distant signal matches the old legend.}\quad \text{new\_role: Lynath Ocirra (mistrustful sky-ward sentry)}\[4pt] \vdots\[4pt] \text{Scene Manager: switch\_scene}\quad \text{The Orphan Gale docks at the floating Sky Citadel, basalt shards wrapped in violet energy.}\[4pt] \vdots\[4pt] \text{Lynath Ocirra: }(\text{hovers by the gangway}) \text{I warned your kind never to breach this corridor.}\[4pt] \text{Isolde Ferrowind: }[Her tone recalls old omens] \text{We come to learn, not plunder.}\[4pt] \text{Scene Manager: end}\quad \text{reason: Alliance formed and first chapter concludes.} \end{array}$

This illustrates integrated scene control (init_scene, add_role, switch_scene, end), coordinated agent turns with immersive markup, and capacity for dynamic cast/scene adaptations—key phenomena AdaRPSet encodes for adaptive role-playing competency.

7. Context and Significance in Adaptive Role-Playing Research

AdaRPSet constitutes a foundational asset for training LLMs capable of narrative-immersive, multi-agent interaction with explicit environmental and interpersonal awareness. Its dual-sourced construct—fusing literature-derived and synthetic, thematically controlled content—directly addresses observed deficits in prior frameworks such as static cast/scene assumptions and inadequate environmental modeling. Modeled jointly with AdaSMSet (for orchestration supervision) and AdaptiveBench (for trajectory-level scoring), AdaRPSet underlies demonstrable improvements in character consistency, environmental grounding, and narrative coherence, enabling models (e.g., 8B actors) to surpass several leading commercial LLMs on core adaptive role-play metrics (Xu et al., 16 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AdaRPSet.