Neeko: Multi-Character Role-Playing Framework
- Neeko is an advanced multi-character role-playing framework that employs dynamic LoRA adapters to simulate multiple distinct personas in open-domain dialogue.
- It utilizes a modular architecture with dedicated LoRA blocks and a Mixture-of-Experts gating network to maintain character consistency and support incremental learning.
- Experimental evaluations demonstrate Neeko's efficiency with faster training times and higher consistency scores compared to traditional methods.
Neeko is an advanced framework for multi-character role-playing in LLMs, engineered to efficiently and faithfully simulate a wide array of distinct personas within open-domain dialogue agents. Unlike conventional approaches—such as prompt engineering, retrieval-augmented generation (RAG), or full-model fine-tuning aimed at a single character—Neeko introduces a dynamic low-rank adapter (LoRA) methodology that enables seamless switching among multiple characters and adaptation to previously unseen roles, providing robust character consistency, style preservation, and modular incremental learning (Yu et al., 21 Feb 2024).
1. Motivation and Problem Context
Multi-character role-playing (MCRP) presents two central challenges for LLM-based agents: maintaining discrete, consistent styles for multiple sequentially simulated roles, and rapidly adapting to new personas without costly retraining. Contemporary methods leveraging prompt injection, in-context learning (ICL), or RAG are constrained by style bleed and inflexible adaptation mechanisms. The Neeko framework addresses these deficiencies with a mechanism for modular character control via dedicated LoRA blocks and a gated controller network, enabling fine-grained, non-overlapping role simulation.
2. Architectural Structure and Phased Workflow
Neeko's system decomposes the agent lifecycle into three distinct phases:
Phase 1: Agent Pre-training
A pretrained conversational LLM (e.g., LLaMA-2 7B) is augmented with M separate, non-overlapping LoRA blocks, each corresponding to one of M predefined characters. Each block—parameterized by low-rank matrices and —is trained exclusively on its corresponding dialogue corpus, resulting in a modular role architecture and a global role embedding matrix , which encodes semantic distributions over character profiles.
Phase 2: Multiple Characters Playing
During inference, the framework retrieves a requested role's embedding and deploys a Mixture-of-Experts style gating network. Contribution weights are computed via , activating the relevant LoRA blocks and synthesizing adapted weights for generation, thereby modulating character-specific output behaviors.
Phase 3: Character Incremental Learning
For previously unseen characters, Neeko offers two adaptation routes:
- Fusion Mode: Blends preexisting LoRA blocks according to gating contributions, without expanding parameter space.
- Expansion Mode: Allocates a new LoRA block (partial rank ), extends , and enlarges the gating network while freezing prior character blocks, preventing catastrophic forgetting.
| Phase | Input | Output |
|---|---|---|
| Pre-training | LLM, role corpora | M LoRA blocks, |
| Multi-play | Prompt, context, role embedding | Role-adapted weights, persona-specific output |
| Incremental | Persona config/data | Fused or extended LoRA block set |
3. Dynamic LoRA Formulation and Gating Network Construction
Neeko generalizes the LoRA mechanism for scalable MCRP by partitioning the rank- adapter into disjoint blocks, each of partial rank (). For a given role :
- LoRA block parameters: ,
- Forward computation:
- Gating controller: selects blocks, reconstructing
This architecture maintains subspace disjointness, thus mitigating interference between characters, and supports compositional character synthesis when data scarcity prohibits full expansion.
4. Training Methodology and Evaluation Metrics
Neeko is instantiated atop a LLaMA-2 backbone, employing the LoRA-based PEFT protocol. Experiments utilized the Character-LLM-Data corpus (9 roles, 857 single-turn, 450 multi-turn dialogues). Hyperparameters include partial rank , learning rate , batch size 16, and 10,000 steps per character. Evaluative prompts follow the format: “I want you to act like {character_name}. The status of you is as follows: Location: … Status: …” plus dialogue history.
Performance is assessed along three axes: character consistency (behavior, utterance), knowledge consistency (virtual, real, hallucinatory), and dialogue consistency (transfer, relevance, stability), as rated by GPT-3.5 with a 1-7 scale. Notably, statistical significance (p < 0.05) confirms Neeko’s improvement over baselines.
| Model | Avg. Consistency Score | Training Time (9 Roles) | Memory (GB) |
|---|---|---|---|
| Neeko (expansion mode) | 5.62 | 2.0 h | 13.6 |
| Character-LLM | 5.62 | 48.6 h | 108 |
| Standard LoRA | 5.54 | 1.7 h | 13.5 |
| GPT-3.5 ICL | 5.59 | — | — |
| GPT-3.5 RAG | 5.48 | — | — |
Transfer metric for multi-role switching: Neeko 5.87 versus vanilla adapters and RAG/ICL below 5.83.
5. Qualitative Analysis, Ablation, and Limitations
Qualitative outputs reveal that Neeko preserves nuanced stylistic elements across character roles, e.g., emulating Dumbledore’s rhetorical wisdom and Voldemort’s brevity more effectively than standard LoRA, which exhibits genericism and style bleeding. Mid-dialogue persona switches maintain distinct catchphrases and sentence rhythms. Ablation studies show that increasing the partial rank yields marginal gains in character consistency but with doubled memory demand beyond . Accuracy saturates around roles; further scaling necessitates proportional adapter rank growth.
Limitations include initialization of from general PLM encoders, which may not capture complex persona representations. Adapter scalability to hundreds of roles implies linear parameter growth; dynamic block sharing or compression methods warrant future investigation. Slight inference latency arises from multi-block gating, potentially mitigated by gated sparsity.
6. Future Directions and Impact
Neeko advances MCRP by enabling modular, low-cost, and high-fidelity character simulation, opening pathways for dialogue agents capable of managing extensive persona inventories without catastrophic forgetting. It stands as a reference implementation for dynamic LoRA adaptation, gating-driven expert selection, and efficient experiment design in personalized, interactive agent research (Yu et al., 21 Feb 2024). A plausible implication is the broad applicability of block-wise LoRA gating to related domains such as emotion modeling, stylistic transfer, and user-adaptive natural language generation. Future work may focus on optimizing role embeddings, adapter parameter sharing, and efficient controller designs for large-scale deployment.