Neeko: Multi-Character Role-Playing Framework

Updated 19 December 2025

Neeko is an advanced multi-character role-playing framework that employs dynamic LoRA adapters to simulate multiple distinct personas in open-domain dialogue.
It utilizes a modular architecture with dedicated LoRA blocks and a Mixture-of-Experts gating network to maintain character consistency and support incremental learning.
Experimental evaluations demonstrate Neeko's efficiency with faster training times and higher consistency scores compared to traditional methods.

Neeko is an advanced framework for multi-character role-playing in LLMs, engineered to efficiently and faithfully simulate a wide array of distinct personas within open-domain dialogue agents. Unlike conventional approaches—such as prompt engineering, retrieval-augmented generation (RAG), or full-model fine-tuning aimed at a single character—Neeko introduces a dynamic low-rank adapter (LoRA) methodology that enables seamless switching among multiple characters and adaptation to previously unseen roles, providing robust character consistency, style preservation, and modular incremental learning (Yu et al., 21 Feb 2024).

1. Motivation and Problem Context

Multi-character role-playing (MCRP) presents two central challenges for LLM-based agents: maintaining discrete, consistent styles for multiple sequentially simulated roles, and rapidly adapting to new personas without costly retraining. Contemporary methods leveraging prompt injection, in-context learning (ICL), or RAG are constrained by style bleed and inflexible adaptation mechanisms. The Neeko framework addresses these deficiencies with a mechanism for modular character control via dedicated LoRA blocks and a gated controller network, enabling fine-grained, non-overlapping role simulation.

2. Architectural Structure and Phased Workflow

Neeko's system decomposes the agent lifecycle into three distinct phases:

Phase 1: Agent Pre-training

A pretrained conversational LLM (e.g., LLaMA-2 7B) is augmented with M separate, non-overlapping LoRA blocks, each corresponding to one of M predefined characters. Each block—parameterized by low-rank matrices $B_k$ and $A_k$ —is trained exclusively on its corresponding dialogue corpus, resulting in a modular role architecture and a global role embedding matrix $E_{global}$ , which encodes semantic distributions over character profiles.

Phase 2: Multiple Characters Playing

During inference, the framework retrieves a requested role's embedding $e_j$ and deploys a Mixture-of-Experts style gating network. Contribution weights $w_j$ are computed via $w_j = Softmax(W_G \cdot e_j)$ , activating the relevant LoRA blocks and synthesizing adapted weights $W' = W + \Delta W$ for generation, thereby modulating character-specific output behaviors.

Phase 3: Character Incremental Learning

For previously unseen characters, Neeko offers two adaptation routes:

Fusion Mode: Blends preexisting LoRA blocks according to gating contributions, without expanding parameter space.
Expansion Mode: Allocates a new LoRA block (partial rank $p$ ), extends $E_{global}$ , and enlarges the gating network while freezing prior character blocks, preventing catastrophic forgetting.

Phase	Input	Output
Pre-training	LLM, role corpora	M LoRA blocks, $E_{global}$
Multi-play	Prompt, context, role embedding	Role-adapted weights, persona-specific output
Incremental	Persona config/data	Fused or extended LoRA block set

3. Dynamic LoRA Formulation and Gating Network Construction

Neeko generalizes the LoRA mechanism for scalable MCRP by partitioning the rank- $r$ adapter into $M$ disjoint blocks, each of partial rank $p$ ( $r = M \cdot p$ ). For a given role $k$ :

LoRA block parameters: $B_k = B[:, (k-1)p : k p]$ ,

$A_k = A[(k-1)p : k p, :]$

Forward computation:

$h = W x + \sum_{i=1}^M B_i A_i x$

Gating controller: $w_k = Softmax(W_G e_k)$ selects blocks, reconstructing

$W' = W + \sum_{i=1}^M w_k[i] B_i A_i$

This architecture maintains subspace disjointness, thus mitigating interference between characters, and supports compositional character synthesis when data scarcity prohibits full expansion.

4. Training Methodology and Evaluation Metrics

Neeko is instantiated atop a LLaMA-2 backbone, employing the LoRA-based PEFT protocol. Experiments utilized the Character-LLM-Data corpus (9 roles, 857 single-turn, 450 multi-turn dialogues). Hyperparameters include partial rank $p = 2$ , learning rate $10^{-4}$ , batch size 16, and $\sim$ 10,000 steps per character. Evaluative prompts follow the format: “I want you to act like {character_name}. The status of you is as follows: Location: … Status: …” plus dialogue history.

Performance is assessed along three axes: character consistency (behavior, utterance), knowledge consistency (virtual, real, hallucinatory), and dialogue consistency (transfer, relevance, stability), as rated by GPT-3.5 with a 1-7 scale. Notably, statistical significance (p < 0.05) confirms Neeko’s improvement over baselines.

Model	Avg. Consistency Score	Training Time (9 Roles)	Memory (GB)
Neeko (expansion mode)	5.62	2.0 h	13.6
Character-LLM	5.62	48.6 h	108
Standard LoRA	5.54	1.7 h	13.5
GPT-3.5 ICL	5.59	—	—
GPT-3.5 RAG	5.48	—	—

Transfer metric for multi-role switching: Neeko 5.87 versus vanilla adapters and RAG/ICL below 5.83.

5. Qualitative Analysis, Ablation, and Limitations

Qualitative outputs reveal that Neeko preserves nuanced stylistic elements across character roles, e.g., emulating Dumbledore’s rhetorical wisdom and Voldemort’s brevity more effectively than standard LoRA, which exhibits genericism and style bleeding. Mid-dialogue persona switches maintain distinct catchphrases and sentence rhythms. Ablation studies show that increasing the partial rank $p$ yields marginal gains in character consistency but with doubled memory demand beyond $p=4$ . Accuracy saturates around $M=9$ roles; further scaling necessitates proportional adapter rank growth.

Limitations include initialization of $E_{global}$ from general PLM encoders, which may not capture complex persona representations. Adapter scalability to hundreds of roles implies linear parameter growth; dynamic block sharing or compression methods warrant future investigation. Slight inference latency arises from multi-block gating, potentially mitigated by gated sparsity.

6. Future Directions and Impact

Neeko advances MCRP by enabling modular, low-cost, and high-fidelity character simulation, opening pathways for dialogue agents capable of managing extensive persona inventories without catastrophic forgetting. It stands as a reference implementation for dynamic LoRA adaptation, gating-driven expert selection, and efficient experiment design in personalized, interactive agent research (Yu et al., 21 Feb 2024). A plausible implication is the broad applicability of block-wise LoRA gating to related domains such as emotion modeling, stylistic transfer, and user-adaptive natural language generation. Future work may focus on optimizing role embeddings, adapter parameter sharing, and efficient controller designs for large-scale deployment.

PDF Markdown Chat (Pro)

References (1)

Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Neeko.