CharacterGLM: Customizable Chinese Dialogue AI
- CharacterGLM is a family of Chinese conversational AI models that enable fine-grained control over dialogue personas via natural language prompts.
- It builds on the transformer-based ChatGLM architecture, using character prompts to enhance consistency, human-likeness, and engagement.
- The framework supports scalable implementations from 6B to 66B parameters, demonstrating competitive performance against leading closed-source models.
CharacterGLM is a family of Chinese conversational AI models designed for character-based dialogue (CharacterDial) with a focus on customizable agent personalities. Built upon the ChatGLM transformer architecture, CharacterGLM enables fine-grained control over AI character attributes and behaviors entirely through prompt conditioning, achieving strong results in consistency, human-likeness, and engagement relative to leading closed-source LLMs.
1. Architecture and Model Family
CharacterGLM utilizes the ChatGLM backbone, which implements a standard transformer-based, autoregressive LLM supporting both Chinese and English. The architecture features Transformer-XL layer patterns with gated rotary positional embeddings and dense self-attention, employing a sliding-window mechanism to support long context handling. No modifications are made to the core attention or feed-forward layers for CharacterGLM; instead, extensibility is accomplished without any additional parameters or adapter modules.
A key technical distinction is the use of a "character prompt": a natural-language description encapsulating both static characteristics and dynamic behavior of a desired persona. This prompt is prepended to every dialogue session such that, during supervised fine-tuning, the model learns to condition its outputs directly on the provided persona context. This approach enables scalable character customization while maintaining architectural simplicity (Zhou et al., 2023).
CharacterGLM has been developed at several model scales:
- 6B parameters (publicly released)
- 12B parameters (API access)
- 66B parameters (API access)
Manual evaluations show clear scaling benefits. The 6B model demonstrates reasonable dialogue fluency but is limited in long-term consistency and engagement. The 12B version achieves notably improved maintenance of character traits across >10 dialogue turns. The 66B variant matches or outperforms GPT-4 overall, with ratings of 4.33 (human-likeness), 4.23 (engagement), and 4.18 (consistency) on a 5-point scale.
2. Customization Mechanism
The core customization method relies on prompt conditioning only. To define a character, a structured profile containing attributes such as name, age, occupation, interests, dislikes, viewpoints, experiences, achievements, social ties, linguistic style, emotional tone, and interaction patterns is transformed into a coherent paragraph by crowdworkers. This natural-language prompt is then used as a prefix to all user interactions with the model.
A representative pseudocode template for constructing such prompts is:
1 2 3 4 5 6 7 8 |
def make_character_prompt(profile): return f""" You are {profile.name}, a {profile.age}-year-old {profile.occupation} living in {profile.home}. You enjoy {profile.interests.join(', ')}, but dislike {profile.dislikes.join(', ')}. You speak in a {profile.style} tone, often saying "{profile.catchphrase}". You believe {profile.viewpoint}. In conversation, you {profile.behavior_pattern}. """ |
At inference, user turns are concatenated to the prompt and dialogue history:
1 |
prompt = character_prompt + "\n<dialogue history>\nUser: " + user_utterance + "\nAssistant:" |
No explicit attribute-conditioning loss or architectural modifications are employed; the model adapts to persona conditioning solely via data-driven learning in supervised fine-tuning (Zhou et al., 2023).
3. Data Sources and Training Pipeline
CharacterGLM's CharacterDial dataset, encompassing approximately 1 million dialogue turns (with 1,000 sessions publicly released), aggregates data from several complementary sources:
- Human role-playing by annotators constructing detailed character profiles and engaging in multi-turn exchanges
- Synthetic dialogue sessions generated by GPT-4, followed by human colloquial paraphrasing to ensure naturalness
- Manual extraction and adaptation from scripts and novels
- Collections from human–prototype interaction designed for post-deployment self-refinement
The training pipeline proceeds as follows:
- Pre-training: Inherits ChatGLM's large-scale bilingual pretraining.
- Supervised fine-tuning (SFT): Inputs consist of the character prompt concatenated with multi-turn dialogue. For the 6B model, typical hyperparameters include a learning rate of (linear warmup + cosine decay), batch size of 64, maximum sequence length of 1,024, and 3–5 epochs.
- Self-refinement: Ongoing post-deployment supervised fine-tuning is performed on real user corrections, with manually “fixed” responses used as additional SFT data.
4. Evaluation Protocol and Empirical Results
Manual evaluation is conducted using a pointwise rating system. Ten crowd-workers interact in sessions of at least 20 turns each with two different characters per model. The evaluation criteria, scored on a 1–5 scale, include:
- Consistency (profile adherence)
- Human-likeness (naturalness of style)
- Engagement (interestingness)
- Quality (fluency and coherence)
- Safety
- Correctness
Principal empirical results for CharacterGLM-66B are as follows:
- Overall score: 4.21 (GPT-4: 4.15; GPT-3.5: 3.49)
- Consistency: 4.18 (tie with best closed-source LLMs)
- Human-likeness: 4.33 (highest among compared models)
- Engagement: 4.23 (highest among compared models)
In pairwise comparisons versus GPT-3.5 on 1,000+ turns, CharacterGLM-66B achieves a Win/Tie/Lose (Engagement) breakdown of 48/12/40, indicating an 8% engagement advantage. Performance is especially enhanced in love-scene dialogues (+15%) and sessions exceeding 10 turns (+7% aggregate advantage) (Zhou et al., 2023).
5. Model Deployment and Usage
The 6B-parameter CharacterGLM model and a subset of 1,000 CharacterDial sessions are released on HuggingFace. The recommended usage workflow is:
- Load the model and tokenizer using the Transformers library.
- Define a character profile and generate the corresponding character prompt.
- Manage dialogue history and concatenate it with the character prompt and each user utterance for inference.
Example Python usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("LingxinAI/CharacterGLM-6b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("LingxinAI/CharacterGLM-6b", load_in_8bit=True, device_map="auto") profile = { "name": "Sun Wukong", "occupation": "Monkey King of the Flower Fruit Mountain", "interests": ["martial arts", "peach banquets"], "style": "playful and heroic", "viewpoint": "I will protect the innocent" } character_prompt = make_character_prompt(profile) history = "" for user_input in ["Hello, who are you?", "Can you fight demons?"]: prompt = character_prompt + "\n" + history + f"User: {user_input}\nAssistant:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_new_tokens=128, do_sample=True, top_p=0.9, temperature=0.8) response = tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) print("Assistant:", response) history += f"User: {user_input}\nAssistant: {response}\n" |
6. Comparative Analysis and Applications
CharacterGLM demonstrates competitive or superior performance to leading closed-source LLMs, particularly excelling in modeling specific character-centric dialogue features fundamental to social and emotionally engaging agents. The framework’s modular prompt-based customization design permits rapid instantiation of diverse AI personas tailored to specific domains or social contexts without architectural retraining or parameter growth.
The model’s prompt-only conditioning approach offers advantages in efficiency, transparency, and extensibility for customization use cases. The availability of a public 6B version and training data subset supports further research in character-based dialogue generation, including adaptations for specific domains such as education, entertainment, and virtual companionship (Zhou et al., 2023).