Papers
Topics
Authors
Recent
2000 character limit reached

ESConv: Emotional Support Dialogue Dataset

Updated 31 December 2025
  • ESConv is a benchmark corpus offering multi-turn, strategy-rich emotional support dialogues that actively reduce psychological distress in research settings.
  • It features rigorous annotations based on Helping Skills Theory, detailed quality control measures, and comprehensive feedback loops to ensure data reliability.
  • Structured from real-world online scenarios, ESConv supports advanced modeling tasks such as multi-strategy generation, preference bias analysis, and LLM steering.

Emotional Support Conversation Dataset (ESConv) is a benchmark corpus for the analysis and modeling of multi‐turn, strategy‐rich emotional support dialogues, designed for research on dialog systems that reduce psychological distress. ESConv offers rigorously–annotated conversational data, explicit psychological support strategy labels, quality control mechanisms, and sophisticated structural diversity for modeling both strategy selection and response generation. It is central to contemporary work on emotional support dialog modeling, preference bias analysis, data augmentation, and LLM steering.

1. Corpus Collection and Structure

ESConv was constructed via crowdsourced help-seeker and supporter pairs, with the core structure grounded in Hill’s Helping Skills Theory (Liu et al., 2021). Each dialogue comprises sequences of seeker statements and supporter responses, with every supporter utterance annotated for strategy. The collection targeted realistic daily emotional support scenarios (e.g., depression, relationship issues, academic pressure), typically using online forum sources such as r/TalkTherapy, Breakthrough, and mentalhealthforum.net (Chen et al., 2023).

  • Size Statistics: 1,053–1,600 multi-turn dialogues (depending on pre-processing), 31,410 utterances, average 6–10 turns per dialog, 16–20 tokens per utterance (Liu et al., 2021, Chen et al., 2023).
  • Participant Filtering: Supporters recruited and evaluated via theory tutorials, rejection criteria, and post-hoc annotation checks. Data filtration included length constraints, star ratings per two turns, and revision of strategy labels (Liu et al., 2021).
  • Train/Dev/Test Splits: Approximately 60%/20%/20%, with some downstream research using 80%/10%/10% or stage-specific slices (Bai et al., 21 May 2025, Kang et al., 2024). Dialogues can be partitioned into context–response “samples” for granular evaluation.

2. Annotation Schema and Quality Control

Annotation in ESConv is multi-dimensional:

  • Support Strategies: Eight main categories, strictly defined and mutually exclusive (Liu et al., 2021, Chen et al., 2023):

    1. Questioning
    2. Restatement/Paraphrasing
    3. Reflection of Feelings
    4. Self-disclosure
    5. Affirmation & Reassurance
    6. Providing Suggestions
    7. Information
    8. Others
  • Dialog Format: Each supporter utterance is marked with a special token indicating strategy; seekers provide pre-chat metadata (problem type, emotion, intensity) and ongoing feedback (1–5 star ratings, post-chat intensity and satisfaction) (Liu et al., 2021, Peng et al., 2022).

  • Quality Mechanisms:
    • Filtering for dialogue completeness, minimum lengths, role balance.
    • Annotation correction (≈17% turns re-reviewed; strategy labels revised).
    • Conversation- and turn-level user feedback used to validate and score support effectiveness; post-survey and star ratings retained for robust user-state tracking.

3. Psychological Framework and Taxonomy

ESConv explicitly operationalizes Helping Skills Theory via a staged conversational taxonomy (Liu et al., 2021):

Stage Purpose Strategy Types
Exploration Elicit problem Questioning, Restatement
Comforting Empathy, validation Reflection, Self-disclosure, Affirmation
Action Guidance Suggestions, Information
Others Pleasantries Others

Support strategies are distributed over the dialog as follows (counts from supporter turns): Questioning (20.9%), Restatement (5.9%), Reflection (7.8%), Self-disclosure (9.4%), Reassurance (16.1%), Suggestions (15.6%), Information (6.1%), Others (18.1%) (Liu et al., 2021).

4. Multi-Strategy and Task Redefinition

Traditional emotional support modeling assumes each supporter turn employs a single strategy. Analysis of ESConv reveals consecutive use of multiple strategies (CUS) within a single turn is common (Bai et al., 21 May 2025):

#Strategies per Response % of Responses
1 82%
2 10%
3 5%
4 2%
≥5 ~1%

Maximum strategies observed per turn is 7. Multi-strategy turns contain concatenated strategy-utterance pairs mapped to the dialog history, leading to a refined task formulation:

  • Original: Generate a single (strategy, utterance) given history.
  • Refined: Generate a sequence (s1,r1),,(sm,rm)(s_1, r_1), \dots, (s_m, r_m) given the context.

This redefinition supports more holistic modeling and evaluation of supporter behaviors, directly impacting LLM evaluation, planning, and diversity (Bai et al., 21 May 2025, Kang et al., 2024).

5. Evaluation Protocols and Metrics

Evaluation on ESConv is multi-layered:

6. Benchmark Tasks and Modeling Advances

ESConv is used for varied modeling tasks:

  • Strategy-aware Generation: Models learn to select and justify support strategies before generating responses; multi-task and content-planning architectures integrate explicit strategy marking (Liu et al., 2021, Peng et al., 2022, Bai et al., 21 May 2025).
  • Knowledge Injection: K-ESConv employs retrieval-augmented prompt learning to inject external (e.g., PsyQA forum) knowledge, boosting comfort and suggestion utility of responses (Chen et al., 2023).
  • Feedback-aware Modeling: FADO jointly exploits turn- and conversation-level user feedback signals for double-controlled, strategy-constrained generation (Peng et al., 2022).
  • Turn-level Transition Modeling: TransESC encodes semantic, strategic, and emotional turn-level transitions using graph structures for smooth conversation flow (Zhao et al., 2023).
  • Preference Bias Analysis: ESConv supports study of LLM tendency toward certain strategies, e.g., Affirmation, with Bradley–Terry modeling of preference distribution and debiasing strategies via planners and external tools (Kang et al., 2024).
  • Data Augmentation: Datasets such as AugESC scale ESConv x45 via LLM-driven dialogue completion and filtering, retaining comparable quality and expanding topical generalization (Zheng et al., 2022).
  • LLM Steerability and Evaluation: Recent work explores steering Llama-series models on ESConv using SRA and prompt-engineering recipes to maintain high strategy adherence in extended conversations (Madani et al., 2024).

7. Limitations, Extensions, and Impact

ESConv is a foundational resource but exhibits limited scale (<2,000 dialogs in most splits), constrained demographic diversity, and biases (e.g., overrepresentation of certain strategies, insufficient [Others]) (Kang et al., 2024). Multi-strategy (CUS) modeling exposes multifaceted support interventions but requires reformulation and complex decoding. Human supporters typically underperform compared to state-of-the-art LLMs under multi-strategy evaluation, sometimes due to restrictive designs of older benchmarks (Bai et al., 21 May 2025). Data augmentation (AugESC) and synthetic extension (15-strategy continuations; SRA-evaluated) expand empirical coverage and enable rigorous open-domain generalization experiments (Zheng et al., 2022, Madani et al., 2024).

Mitigating preference bias, integrating feedback, injection of external knowledge, and robust evaluation remain open research directions. ESConv’s design and annotation granularity underlie methodological advances in strategy learning, real-user feedback incorporation, and progressive LLM alignment for emotional support dialog systems.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Emotional Support Conversation Dataset (ESConv).