Papers
Topics
Authors
Recent
Search
2000 character limit reached

CSConv Dataset: Dual-Dialogue Applications

Updated 18 May 2026
  • CSConv Dataset is a dual-domain resource offering strategy-aware customer support and emotionally-sensitive cognitive stimulation dialogues.
  • It employs rigorous annotation schemes and LLM-driven normalization to structure multi-turn conversational interactions effectively.
  • Empirical benchmarks using RoleCS demonstrate significant gains in lexical, semantic, and strategy accuracy for state-of-the-art dialogue models.

CSConv Dataset refers to two independent dialogue corpora that share the same acronym but serve distinct research domains: customer support conversation modeling (Zhu et al., 6 Aug 2025), and cognitive stimulation dialogue for elders with cognitive impairment (Jiang et al., 2023). This entry distinguishes both lines of research, providing technical details on their structure, annotation schemes, and utility in evaluating conversational systems.

Overview and Task Framing

CSConv is a benchmark for Customer Support Conversation (CSC), addressing the challenge of training service agents—or LLM-based systems—to produce responses that are not only problem-solving but also empathetic and aligned with the COPC guidelines. The formal definition models dialogue as a sequence

D={(Pi,Ti,Ui)}i=1ND = \left\{ (P_i, T_i, U_i) \right\}_{i=1}^N

with Pi∈{S,C}P_i \in \{S, C\} (supporter/customer), TiT_i as strategy label from a set GG (if Pi=SP_i=S), and UiU_i as textual utterance. At each supporter turn kk, the system receives context Xk={(Pi,Ti,Ui)}i=1k−1X_k = \left\{(P_i, T_i, U_i)\right\}_{i=1}^{k-1} and must: (1) predict the next conversational strategy Tk∈GT_k \in G; (2) generate the supporter response UkU_k conditioned on Pi∈{S,C}P_i \in \{S, C\}0.

High-quality support is organized into five stages and twelve strategies, governed by formal COPC standards:

Stage Description
Connecting Greeting, rapport building
Identifying Understanding issue/data/emotion
Exploring Discussing/evaluating solutions
Resolving Delivering, confirming resolution
Maintaining Closing, relationship preservation
Strategy Name Abbrev. Core Action
Greeting GT Friendly welcome/rapport
Identity Verification IV Confirm identity for security
Emotional Management EM Express empathy/understanding
Restatement/Paraphrase RP Clarify via rephrasing
Problem Refinement PR Refine via targeted questioning
Providing Suggestions PS Advise next steps/options
Information Delivery ID Explain policy/process
Resolution Implementation RI Execute concrete resolution
Feedback Request FR Inquire about resolution satisfaction
Appreciation & Closure AC Thank and close the conversation
Relationship Continuation RC Encourage further engagement
Others — Out-of-scope actions

2. Construction and Curation Pipeline

Data Acquisition and LLM-Driven Normalization

CSConv originates from 690,000 anonymized, professional transcriptions of Chinese customer service conversations in both pre-sales and post-sales contexts. Pre-filtering eliminates dialogues outside 6–60 utterances, enforces utterance length ≤ 500 characters, controls for turn imbalances (Pi∈{S,C}P_i \in \{S, C\}1), mandates effectiveness in customer contributions, and removes unprofessional content via Qwen2.5-72B.

To enforce explicit strategic structure, DeepSeek-R1 LLM is prompted to rewrite sampled dialogs (≤500 per topic) such that each supporter turn is clearly annotated with an inferred strategy Pi∈{S,C}P_i \in \{S, C\}2. Customer responses are optionally refined for coherence, producing consistent, strategy-aware corpora.

Post-Processing and Annotation

Dialogs are retained only if they satisfy structural constraints (e.g., ≥10 utterances, inclusion of GT, IV, and AC stages, strict speaker alternation). Further LLM checks filter instances for coherence and empathy. Certified experts manually annotate each supporter turn with one of the twelve strategies and attest to stage boundaries per the COPC-aligned guidelines. No numerical inter-annotator agreement is reported, but all annotators are domain-credentialed.

Corpus Statistics

Original Rewritten
Conversations 1,855 1,855
Total utterances 35,350 50,587
Supporter utterances 17,862 25,810
Customer utterances 17,488 24,777
Avg. supporter utt. 9.63 13.91
Avg. customer utt. 9.43 13.36
Avg. supporter len. 41.16 48.72
Avg. customer len. 21.60 17.17
Strategy-labeled (\%) 55.28 97.82

Topic coverage spans eight topical domains plus "Others," each representing roughly 11–16% of dialogues. The most frequent strategies are Information Delivery (14.9%), Emotional Management (11.9%), and Providing Suggestions (10.0%).

3. RoleCS: Synthetic Training Corpus

To overcome scarcity of strategy-rich customer support interactions for model training, a role-playing LLM framework generates RoleCS—a large-scale synthetic dataset. Five LLM roles are orchestrated by DeepSeek-R1:

  • Planner selects (topic, persona) pairs, crafting customer goals and context scenarios.
  • Supporter Assistant recommends next strategy Pi∈{S,C}P_i \in \{S, C\}3 based on supporter's dialogue history.
  • Supporter generates responses following the given strategy.
  • Customer Assistant directs customer dialogue progression.
  • Customer replies in alignment with persona and scenario.

Customer personas are extracted from 15,980 real dialogues and de-duplicated by cosine embedding similarity, yielding a profile pool of 1,948. After generation and filtering, RoleCS comprises 11,232 dialogues with high strategy diversity.

Value
Dialogues 11,232
Utterances 263,580
Avg. per conv. 23.47
Supporter utts. 137,406
Avg. supp. len. 66.98
Customer utts. 126,174
Avg. cust. len. 46.43

4. Benchmarking and Empirical Insights

Fine-tuning SOTA LLMs (LLaMA 3.1, Qwen2.5, DeepSeek) on RoleCS yields significant performance gains on CSConv across lexical (BLEU-n, ROUGE-L), semantic (BERTScore, BLEURT), and strategy-accuracy metrics:

  • Fine-tuning on RoleCS increases BLEU-2/4 by up to 5/2 points, ROUGE-L by ≈2, and accuracy by 5–6 points.
  • Qwen2.5-72B, despite being substantially smaller than DeepSeek, achieves competitive metrics.
  • Maintained performance drops under generated-context (free-running) evaluation, indicating context drift challenges in multi-turn deployment.
Model (ft) BLEU-2 BLEU-4 ROUGE-L ACC
Qwen2.5-72B + RoleCS 12.15 5.32 7.97 43.29

Human annotators and LLM evaluators (GPT-4o, Qwen-Plus) rate Qwen2.5-72B + RoleCS as delivering the highest response quality (3.79/5 humans; ~91/100 LLM), with Fleiss' Kappa indicating strong inter-rater reliability (0.628 human, 0.658 human-vs-LLM).

Design and Annotation

The "CSConv" dataset from (Jiang et al., 2023) is tailored for research at the juncture of cognitive stimulation therapy and dialogue systems. This resource consists of 2,643 dialogues (16,845 utterances) for Chinese-speaking elders with cognitive impairment, combining video transcripts (BrainLive project, ~1,800) and hand-scripted sessions (~900), all translated or authored in Mandarin.

Labeling is triple-layered per utterance: (1) Cognitive Stimulation (CS) principle (7-way), (2) emotion (8-way), (3) emotional support strategy (7-way), administered via BERT-based classifiers and iterative manual review.

CS Label Count %
None 5,296 31.4
Inquiry 4,156 24.7
Respect 2,134 12.7
Reminisc. 464 2.8
Expression 2,651 15.7
Enjoyment 1,862 11.1
Comfort 281 1.7
Strategy Count %
None 7,060 41.9
Question 4,195 24.9
Reflection of Feelings 293 17.4
Self-disclosure 3,022 17.9
Providing Suggestions 262 1.6
Information 819 4.9
Others 1,190 7.1

Utterances average 9.5 tokens. Dialogue scenarios are open-ended, encompassing reminiscence, comfort, chit-chat, and games.

The dataset is not pre-split into training, development, or test sets. Users establish their own partitions for modeling.

6. Access, Licensing, and Usage

Both datasets are publicly accessible:

No explicit license or usage restrictions are provided in the original reports; prospective users should consult each repository for up-to-date terms. There are no reported limitations on research or academic usage for either corpus.

7. Significance and Use Cases

The customer support CSConv (Zhu et al., 6 Aug 2025), paired with RoleCS, supports evaluation and fine-tuning of LLMs in strategy-aware dialogue generation, establishing empirical baselines for copc-guided, empathetically-fluent agent systems in Chinese. The cognitive stimulation CSConv (Jiang et al., 2023) is the only large-scale, annotated dataset of its kind for conversational cognitive support, useful for modeling emotionally-grounded, therapeutic interactions.

Frequent reuse of the acronym "CSConv" should be contextually clarified due to the existence of two unrelated corpora under this name. A plausible implication is that future works should reference the task/intent (customer support vs. cognitive stimulation) when referring to "CSConv" datasets to avoid ambiguity.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CSConv Dataset.