CSConv Dataset: Dual-Dialogue Applications
- CSConv Dataset is a dual-domain resource offering strategy-aware customer support and emotionally-sensitive cognitive stimulation dialogues.
- It employs rigorous annotation schemes and LLM-driven normalization to structure multi-turn conversational interactions effectively.
- Empirical benchmarks using RoleCS demonstrate significant gains in lexical, semantic, and strategy accuracy for state-of-the-art dialogue models.
CSConv Dataset refers to two independent dialogue corpora that share the same acronym but serve distinct research domains: customer support conversation modeling (Zhu et al., 6 Aug 2025), and cognitive stimulation dialogue for elders with cognitive impairment (Jiang et al., 2023). This entry distinguishes both lines of research, providing technical details on their structure, annotation schemes, and utility in evaluating conversational systems.
1. Customer Support Conversation: CSConv (Zhu et al., 6 Aug 2025)
Overview and Task Framing
CSConv is a benchmark for Customer Support Conversation (CSC), addressing the challenge of training service agents—or LLM-based systems—to produce responses that are not only problem-solving but also empathetic and aligned with the COPC guidelines. The formal definition models dialogue as a sequence
with (supporter/customer), as strategy label from a set (if ), and as textual utterance. At each supporter turn , the system receives context and must: (1) predict the next conversational strategy ; (2) generate the supporter response conditioned on 0.
High-quality support is organized into five stages and twelve strategies, governed by formal COPC standards:
| Stage | Description |
|---|---|
| Connecting | Greeting, rapport building |
| Identifying | Understanding issue/data/emotion |
| Exploring | Discussing/evaluating solutions |
| Resolving | Delivering, confirming resolution |
| Maintaining | Closing, relationship preservation |
| Strategy Name | Abbrev. | Core Action |
|---|---|---|
| Greeting | GT | Friendly welcome/rapport |
| Identity Verification | IV | Confirm identity for security |
| Emotional Management | EM | Express empathy/understanding |
| Restatement/Paraphrase | RP | Clarify via rephrasing |
| Problem Refinement | PR | Refine via targeted questioning |
| Providing Suggestions | PS | Advise next steps/options |
| Information Delivery | ID | Explain policy/process |
| Resolution Implementation | RI | Execute concrete resolution |
| Feedback Request | FR | Inquire about resolution satisfaction |
| Appreciation & Closure | AC | Thank and close the conversation |
| Relationship Continuation | RC | Encourage further engagement |
| Others | — | Out-of-scope actions |
2. Construction and Curation Pipeline
Data Acquisition and LLM-Driven Normalization
CSConv originates from 690,000 anonymized, professional transcriptions of Chinese customer service conversations in both pre-sales and post-sales contexts. Pre-filtering eliminates dialogues outside 6–60 utterances, enforces utterance length ≤ 500 characters, controls for turn imbalances (1), mandates effectiveness in customer contributions, and removes unprofessional content via Qwen2.5-72B.
To enforce explicit strategic structure, DeepSeek-R1 LLM is prompted to rewrite sampled dialogs (≤500 per topic) such that each supporter turn is clearly annotated with an inferred strategy 2. Customer responses are optionally refined for coherence, producing consistent, strategy-aware corpora.
Post-Processing and Annotation
Dialogs are retained only if they satisfy structural constraints (e.g., ≥10 utterances, inclusion of GT, IV, and AC stages, strict speaker alternation). Further LLM checks filter instances for coherence and empathy. Certified experts manually annotate each supporter turn with one of the twelve strategies and attest to stage boundaries per the COPC-aligned guidelines. No numerical inter-annotator agreement is reported, but all annotators are domain-credentialed.
Corpus Statistics
| Original | Rewritten | |
|---|---|---|
| Conversations | 1,855 | 1,855 |
| Total utterances | 35,350 | 50,587 |
| Supporter utterances | 17,862 | 25,810 |
| Customer utterances | 17,488 | 24,777 |
| Avg. supporter utt. | 9.63 | 13.91 |
| Avg. customer utt. | 9.43 | 13.36 |
| Avg. supporter len. | 41.16 | 48.72 |
| Avg. customer len. | 21.60 | 17.17 |
| Strategy-labeled (\%) | 55.28 | 97.82 |
Topic coverage spans eight topical domains plus "Others," each representing roughly 11–16% of dialogues. The most frequent strategies are Information Delivery (14.9%), Emotional Management (11.9%), and Providing Suggestions (10.0%).
3. RoleCS: Synthetic Training Corpus
To overcome scarcity of strategy-rich customer support interactions for model training, a role-playing LLM framework generates RoleCS—a large-scale synthetic dataset. Five LLM roles are orchestrated by DeepSeek-R1:
- Planner selects (topic, persona) pairs, crafting customer goals and context scenarios.
- Supporter Assistant recommends next strategy 3 based on supporter's dialogue history.
- Supporter generates responses following the given strategy.
- Customer Assistant directs customer dialogue progression.
- Customer replies in alignment with persona and scenario.
Customer personas are extracted from 15,980 real dialogues and de-duplicated by cosine embedding similarity, yielding a profile pool of 1,948. After generation and filtering, RoleCS comprises 11,232 dialogues with high strategy diversity.
| Value | |
|---|---|
| Dialogues | 11,232 |
| Utterances | 263,580 |
| Avg. per conv. | 23.47 |
| Supporter utts. | 137,406 |
| Avg. supp. len. | 66.98 |
| Customer utts. | 126,174 |
| Avg. cust. len. | 46.43 |
4. Benchmarking and Empirical Insights
Fine-tuning SOTA LLMs (LLaMA 3.1, Qwen2.5, DeepSeek) on RoleCS yields significant performance gains on CSConv across lexical (BLEU-n, ROUGE-L), semantic (BERTScore, BLEURT), and strategy-accuracy metrics:
- Fine-tuning on RoleCS increases BLEU-2/4 by up to 5/2 points, ROUGE-L by ≈2, and accuracy by 5–6 points.
- Qwen2.5-72B, despite being substantially smaller than DeepSeek, achieves competitive metrics.
- Maintained performance drops under generated-context (free-running) evaluation, indicating context drift challenges in multi-turn deployment.
| Model (ft) | BLEU-2 | BLEU-4 | ROUGE-L | ACC |
|---|---|---|---|---|
| Qwen2.5-72B + RoleCS | 12.15 | 5.32 | 7.97 | 43.29 |
Human annotators and LLM evaluators (GPT-4o, Qwen-Plus) rate Qwen2.5-72B + RoleCS as delivering the highest response quality (3.79/5 humans; ~91/100 LLM), with Fleiss' Kappa indicating strong inter-rater reliability (0.628 human, 0.658 human-vs-LLM).
5. Cognitive Stimulation CSConv (Jiang et al., 2023)
Design and Annotation
The "CSConv" dataset from (Jiang et al., 2023) is tailored for research at the juncture of cognitive stimulation therapy and dialogue systems. This resource consists of 2,643 dialogues (16,845 utterances) for Chinese-speaking elders with cognitive impairment, combining video transcripts (BrainLive project, ~1,800) and hand-scripted sessions (~900), all translated or authored in Mandarin.
Labeling is triple-layered per utterance: (1) Cognitive Stimulation (CS) principle (7-way), (2) emotion (8-way), (3) emotional support strategy (7-way), administered via BERT-based classifiers and iterative manual review.
| CS Label | Count | % |
|---|---|---|
| None | 5,296 | 31.4 |
| Inquiry | 4,156 | 24.7 |
| Respect | 2,134 | 12.7 |
| Reminisc. | 464 | 2.8 |
| Expression | 2,651 | 15.7 |
| Enjoyment | 1,862 | 11.1 |
| Comfort | 281 | 1.7 |
| Strategy | Count | % |
|---|---|---|
| None | 7,060 | 41.9 |
| Question | 4,195 | 24.9 |
| Reflection of Feelings | 293 | 17.4 |
| Self-disclosure | 3,022 | 17.9 |
| Providing Suggestions | 262 | 1.6 |
| Information | 819 | 4.9 |
| Others | 1,190 | 7.1 |
Utterances average 9.5 tokens. Dialogue scenarios are open-ended, encompassing reminiscence, comfort, chit-chat, and games.
The dataset is not pre-split into training, development, or test sets. Users establish their own partitions for modeling.
6. Access, Licensing, and Usage
Both datasets are publicly accessible:
- CSConv (customer support): https://github.com/aliyun/qwen-dianjin
- CSConv (cognitive stimulation): https://github.com/jiangjyjy/CSD
No explicit license or usage restrictions are provided in the original reports; prospective users should consult each repository for up-to-date terms. There are no reported limitations on research or academic usage for either corpus.
7. Significance and Use Cases
The customer support CSConv (Zhu et al., 6 Aug 2025), paired with RoleCS, supports evaluation and fine-tuning of LLMs in strategy-aware dialogue generation, establishing empirical baselines for copc-guided, empathetically-fluent agent systems in Chinese. The cognitive stimulation CSConv (Jiang et al., 2023) is the only large-scale, annotated dataset of its kind for conversational cognitive support, useful for modeling emotionally-grounded, therapeutic interactions.
Frequent reuse of the acronym "CSConv" should be contextually clarified due to the existence of two unrelated corpora under this name. A plausible implication is that future works should reference the task/intent (customer support vs. cognitive stimulation) when referring to "CSConv" datasets to avoid ambiguity.