SoulChatCorpus: Chinese Empathy Dialogue Dataset

Updated 16 March 2026

SoulChatCorpus is a large-scale, Chinese-language multi-turn empathy dialogue dataset curated for psychological counseling and mental health applications.
It utilizes a rigorous methodology combining crowdsourced data, ChatGPT-generated turns, and manual proofreading to ensure high quality and nuanced empathy expression.
Fine-tuned using ChatGLM-6B with detailed evaluations, the dataset significantly enhances model fluency and empathetic response generation in counseling scenarios.

SoulChatCorpus is a large-scale, Chinese-language multi-turn empathy conversation dataset specifically designed for developing LLMs with advanced empathetic capabilities in the mental-health domain. Encompassing 2,300,248 high-quality dialogue samples between simulated users and psychological consultants, SoulChatCorpus represents the first million-scale, Chinese multi-turn empathy corpus explicitly curated for psychological counseling and emotionally attuned dialogue systems (Chen et al., 2023).

1. Dataset Composition and Structure

Each entry in SoulChatCorpus comprises a multi-turn conversation between a "user" and a "psychological consultant." Typical dialogues span 3–6 utterances (2–3 back-and-forth turns), though some extend to 8–10 utterances depending on the elaborateness of the original long-form reply. The corpus adopts a standardized JSON schema:

Field	Type	Description
id	string	Unique identifier for the dialogue sample
context	list	Sequence of utterances (alternating "user" and "consultant," excluding the target)
response	string	The final consultant’s empathetic reply, to be generated by the model

Speaker labels are consistently fixed as "user" and "consultant." The context contains the full dialogue history up to but not including the target consultant response. There is no separate manual annotation for empathy strategy classes; rather, six broad empathy strategies—questioning, comfort, recognition, listening, trust, and emotional support—are implicitly encoded via the data construction process.

2. Data Source, Generation, and Quality Control

The initial data pool was constructed via crowdsourcing, comprising 215,813 user questions and 619,725 consultant answers, drawn from twelve predefined counseling topics (see Figure 1 in the source paper). Data privacy was enforced through rule-based filtering to remove sensitive terms (e.g., “自杀,” “跳楼”) and further cleanup by professional proofreaders who excised or rewrote any personal or harmful content.

To generate multi-turn dialogues, each single-turn QA pair was processed by ChatGPT (99% gpt-3.5-turbo, 1% gpt-4), using a Chinese-language "empathy constraint prompt." This prompt directed ChatGPT to break the consultant’s answer into an interactive, multi-turn conversational sequence, showcasing explicit empathetic strategies such as questioning, listening, recognition, comfort, trust-building, and emotional support. All model-generated dialogues were subjected to manual proofreading, with 105,134 defective samples discarded. The resulting dataset is therefore of verified high quality, with logical coherence and accurate empathy expression (Chen et al., 2023).

3. Representative Dialogue Samples

Samples from SoulChatCorpus exemplify contextually sensitive, empathetic consultant responses. Two representative translated examples:

Example 1: Workplace Stress

Context:
1. user: “I’ve been feeling pressured at work ever since my manager started assigning tasks last minute.”
2. consultant: “It sounds as if those sudden deadlines are causing you stress—could you tell me more about how it affects your day?”
3. user: “I find myself anxious and unable to focus; I worry I’ll let the team down.”
Target Response: consultant: “I really understand how overwhelming it must feel when expectations change abruptly. You’ve been doing your best under pressure, and it’s natural to feel anxious. Let’s explore what support or boundaries you might put in place so you can manage these deadlines more comfortably.”

Example 2: Loneliness

Context:
1. user: “Ever since my friends moved away, I’ve been feeling deeply lonely.”
2. consultant: “Loneliness can be painful. In what moments do you feel it most strongly?”
3. user: “Mostly in the evenings, when I have no one to call or hang out with.”
Target Response: consultant: “I can imagine how quiet and isolating evenings must feel. You deserve connection and warmth. Perhaps we can brainstorm small steps to reach out—maybe a hobby group or even a quick check-in call—that help you feel less alone.”

These samples typify the dataset’s focus on empathetic recognition, reflection, and actionable support.

4. Fine-Tuning Methodology

The SoulChatCorpus was used to fine-tune ChatGLM-6B (6.2B parameters) using the conventional token-level cross-entropy loss:

$L(\theta) = -\sum_{t=1}^T \log P_\theta(y_t | y_{<t}, x)$

where $x$ denotes the serialized conversation context, $y_{<t}$ the previously generated tokens, and $y_t$ the target token at time step $t$ .

Fine-tuning regimen:

Batch size: 80
Training steps: 30,000 (approximately 1 epoch)
Learning rate: warmup to $5 \times 10^{-5}$ (1,000 steps), then linear decay
Maximum input length: 1,536 tokens
Maximum target length: 512 tokens
Optimizer: AdamW with default $\beta$ ’s and weight decay
Decoding configuration: top-p sampling ( $p=0.75$ ), temperature 0.95

This setup leverages the scale of SoulChatCorpus for robust training and practical inference.

5. Evaluation Protocols and Results

SoulChatCorpus models were evaluated automatically (BLEU-1 to BLEU-4, ROUGE-1/2/L) on a 10,000-sample held-out test, and via human expert assessment (100 samples) on four CEHS metrics: Content naturalness (0–2), Empathy level (0–2), Helpfulness (0–2), and Safety (0–1). CEHS judgments were made by three psychology experts; Fleiss’ $\kappa$ values for inter-rater agreement ranged from 0.472 to 1.00.

Comparative Metrics Table

Model	BLEU-1	BLEU-4	ROUGE-L	Empathy (0–2)
ChatGLM-6B	22.73	4.92	18.84	1.55
MeChat	29.43	6.71	21.12	1.70
ChatGPT	27.98	6.23	21.92	1.62
SoulChat (fine-tuned)	33.78	8.52	26.57	1.84

On zero-shot evaluation with SMILECHAT (355,733 samples), SoulChat yields BLEU-1 = 35.40 and Empathy = 1.90, a +12.5 BLEU-1 improvement over ChatGLM-6B. These results indicate significant gains in generating fluent, comfort-oriented, and empathetic model outputs.

6. Licensing, Distribution, and Citation

The SoulChatCorpus and associated SoulChat model will be distributed under an academic-research license, with details specified in the final publication and accompanying repository (https://github.com/scutcyr/SoulChat). The dataset download and explicit license file (e.g., CC-BY-NC-4.0 or equivalent) will be accessible via this link. Users are instructed to cite:

Y. Chen, X. Xing, J. Lin, H. Zheng, Z. Wang, Q. Liu, X. Xu. “SoulChat: Improving LLMs’ Empathy, Listening, and Comfort Abilities through Fine-tuning with Multi-turn Empathy Conversations,” ACL 2024.

In sum, SoulChatCorpus provides a rich, vetted resource for advancing LLMs’ ability to engage in nuanced, empathetic, and supportive multi-turn dialogue within the mental health context, with demonstrated effectiveness across both automatic and expert-centric human evaluation benchmarks (Chen et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

SoulChat: Improving LLMs' Empathy, Listening, and Comfort Abilities through Fine-tuning with Multi-turn Empathy Conversations (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SoulChatCorpus.

SoulChatCorpus: Chinese Empathy Dialogue Dataset

1. Dataset Composition and Structure

2. Data Source, Generation, and Quality Control

3. Representative Dialogue Samples

4. Fine-Tuning Methodology

5. Evaluation Protocols and Results

Comparative Metrics Table

6. Licensing, Distribution, and Citation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SoulChatCorpus: Chinese Empathy Dialogue Dataset

1. Dataset Composition and Structure

2. Data Source, Generation, and Quality Control

3. Representative Dialogue Samples

4. Fine-Tuning Methodology

5. Evaluation Protocols and Results

Comparative Metrics Table

6. Licensing, Distribution, and Citation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research