Papers
Topics
Authors
Recent
2000 character limit reached

ESConv Dataset: Emotional Support Dialog Corpus

Updated 6 February 2026
  • ESConv dataset is a comprehensive multi-turn emotional support dialogue corpus annotated with detailed support strategies grounded in Helping Skills Theory.
  • The dataset features a structured annotation framework categorizing supporter responses into exploration, comforting, and action strategies for nuanced support interactions.
  • Rigorous crowdsourcing and quality control protocols ensure high-fidelity dialogues, enabling robust benchmarking of strategy-aware dialog generation using both automatic and human evaluations.

The ESConv dataset is a high-quality, crowdsourced corpus designed for developing and evaluating emotional support dialog systems. Targeted at modeling the subtleties of multi-turn emotional support interactions, ESConv operationalizes the Emotional Support Conversation (ESC) task and provides a fine-grained annotation schema grounded in the Helping Skills Theory. It has become a central benchmark for research on strategy-aware and contextually sensitive dialog generation in support scenarios, including mental health and customer service contexts (Liu et al., 2021, Zheng et al., 2022).

1. Task Formulation and Objectives

ESConv formalizes the Emotional Support Conversation (ESC) task as an interaction between a help-seeker and a supporter. Each dialog begins with a seeker providing (a) a negative emotion label ee (from 7 categories, e.g., "anxiety," "sadness"), (b) an intensity value l{1,,5}l \in \{1,\ldots, 5\}, and (c) a free-text description of the underlying distress. The supporter's mandate is to mitigate the seeker’s emotional distress by sequentially deploying an array of emotional support strategies:

  • Identification and exploration of the seeker’s problem.
  • Comforting or empathic interventions.
  • Action-oriented suggestions or relevant information.

Task success is primarily quantified as a positive reduction in the seeker’s self-reported intensity, Δl=lbeforelafter>0\Delta l = l_{\mathrm{before}} - l_{\mathrm{after}} > 0. Secondary evaluation criteria highlight the agent’s ability for strategy-aware response generation, dynamic emotion-state tracking, and effectiveness beyond generic dialog relevance or coherence.

2. Annotation Framework and Strategy Taxonomy

The ESConv annotation schema is explicitly tied to the stages of emotional helping as formalized in the ESC Framework. Each supporter utterance is assigned exactly one strategy label out of eight defined categories, organized into three principal stages and a residual "Other" bucket:

  • Exploration
    • Question: Elicit details through open- or close-ended queries.
    • Restatement/Paraphrasing: Rephrase seeker content to confirm understanding.
  • Comforting
    • Reflection of Feelings: Explicitly acknowledge the seeker’s emotional state.
    • Self-disclosure: Sharing analogous personal experience to foster connection.
    • Affirmation and Reassurance: Recognize the seeker's strengths and offer encouragement.
  • Action
    • Providing Suggestions: Propose practical coping tactics or next steps.
    • Information: Cite concrete facts or recommend external resources.
  • Others
    • Utterances not encapsulated by the above, including greetings or general supportive acts.

This structured annotation allows the modeling and evaluation of strategy selection and sequencing, enabling investigations into the impact of specific support behaviors on dialog success and user well-being.

3. Data Collection Protocols and Quality Control

Data for ESConv was acquired via rigorous crowdsourcing protocols with precise role division:

  • Help-seeker: Describes a current or past emotional difficulty by selecting a problem category, emotion label, and intensity, and providing a free-text narrative.
  • Supporter: Must pass a detailed competency tutorial and qualification (11 subtasks; only 7.8% pass rate), and annotate each outgoing utterance with the appropriate support strategy.

Ongoing dialog quality is maintained via multi-layered mechanisms:

  • In-chat star ratings (1–5) for supporter helpfulness, recorded by the seeker at every two supporter turns.
  • Post-chat surveys collecting (a) seeker’s post-interaction intensity, empathy, and relevance ratings; (b) supporter’s appraisal of seeker engagement.
  • Automatic and manual filters: incomplete dialogs (less than 16 utterances) are dropped, and multi-criteria auto-approval and manual annotation corrections are applied. For instance, only dialogs with meaningful improvement (Δintensity ≥ 1), sufficient utterance length, and high empathy/relevance ratings pass quality thresholds.
  • Manual review and correction of strategy codes (applied to 17% of dialogs) and intensity labels as required.

These protocols yielded a final corpus of 1,053 high-quality dialogues extracted from an initial pool of 2,472 (Liu et al., 2021).

4. Dataset Statistics and Structure

ESConv is composed of 1,053 dialogues totaling 31,410 utterances (supporter: 14,855; seeker: 16,555). Dialogues are relatively long-form, averaging 29.8 turns and 17.8 tokens per utterance, with an average session lasting 22.6 minutes. Annotations reveal prominent discussion of chronic depression, job crises, breakups, interpersonal conflict, and academic pressure. The distribution of pre-chat emotions is dominated by “anxiety” (26.7%), “depression” (26.2%), and “sadness” (23.7%).

Seeker feedback rates indicate high perceived supportiveness, with 50.6% of in-chat ratings at the maximum of "5 – Excellent." Support strategy usage, as tagged by supporters, is distributed as follows:

Strategy Supporter Utterances
Question 20.9%
Affirmation/Reassurance 16.1%
Providing Suggestions 15.6%
Others 18.1%
Reflection of Feelings 7.8%
Self-disclosure 9.4%
Information 6.1%
Restatement 5.9%

Each dialog’s canonical representation is a JSON object containing pre-survey metadata, an array of turns with speaker and strategy attributes, interleaved in-chat ratings, and post-survey outcomes. Dataset splits for experimentation are 60/20/20 (train/dev/test).

5. Evaluation Protocols and Baseline Results

ESConv provides a foundation for benchmarking both automatic and human-in-the-loop models under strategy-constrained response paradigms.

  • Automatic Metrics
    • Perplexity (PPL): Measures LLM uncertainty over test responses.
    • BLEU-2, ROUGE-L: Quantify n-gram and LCS-based overlap, respectively.
    • BOW Embedding Extrema: Cosine similarity of extreme word vectors.
  • Model Variants
    • Vanilla: Fine-tuned directly on ESConv.
    • Oracle: Gold strategy prepending.
    • Joint: Strategy prediction preceding response generation.
    • Random: Random strategy prepending, matching corpus distribution.
    • Backbone architectures: DialoGPT-small, BlenderBot-small.

For BlenderBot, Oracle modeling yields superior results, e.g., BLEU-2 = 6.31, ROUGE-L = 17.90 vs. Vanilla BLEU-2 = 5.45, ROUGE-L = 15.43. This quantifies a strong positive effect of explicit strategy guidance.

  • Human Evaluation
    • Dialog models are interactively compared on fluency, exploration, comforting, suggestions, and holistic preference. Joint strategy-aware models outperformed both un-tuned and vanilla baselines, achieving 72–75% win rates (p<0.05) against un-tuned models and up to 54% against vanilla baselines on all major metrics (Liu et al., 2021).

6. Extended Use and Dataset Augmentation

The limited scale and topic range of ESConv motivated large-scale augmentation efforts. The AugESC dataset (Zheng et al., 2022) expands ESConv to 65,000 sessions using a GPT-J-based LLM, fine-tuned on 100 ESConv dialogs and autoregressively completing dialogue seeds drawn from emotion-labeled posts. Stringent postprocessing (i.e., format validation, balance enforcement, turn and utterance length constraints) produces high-fidelity simulated dialogs with topic coverage (as measured by Dirichlet-log odds ratio statistics) and diversity exceeding the original ESConv.

Empirical evaluations show that models post-trained on AugESC match ESConv in automatic and human-rated dialogue quality (informativeness, understanding, helpfulness) and substantially improve generalization to open-domain emotional support scenarios, with no meaningful degradation of in-domain performance.

7. Illustrative Example

A dialog excerpt annotated according to the ESConv schema illustrates the modeling granularity:

Pre-chat

  • Problem: Breakup with Partner
  • Emotion: Sadness
  • Intensity: 5
  • Situation: "We broke up two weeks ago; I still miss them."

Dialogue

  1. Seeker: "I can’t sleep because I keep thinking about them."
  2. Supporter [Question]: "How long did you two date before the breakup?"
  3. Seeker: "Over three years."
  4. Supporter [Reflection of Feelings]: "That must feel like a huge loss after so much time."
  5. Seeker: "Yes, I feel empty and lonely."
  6. Supporter [Self-disclosure]: "I went through something similar last year; it took me months to heal."
  7. Supporter [Affirmation/Reassurance]: "That shows how much you cared—time will help you feel whole again."
  8. Seeker: "Thanks, I needed to hear that."
  9. Supporter [Providing Suggestions]: "Would you consider reaching out to friends or picking up a hobby to fill your evenings?"

Feedback

  • In-chat: 4–5 star ratings
  • Post-chat: Intensity after = 2, Empathy = 5, Relevance = 5

Summary

ESConv is a rigorously curated, strategy-annotated multi-turn emotional support conversation corpus, enabling development and analysis of dialog systems that must balance empathy, targeted questioning, and actionable guidance. Its annotation granularity, quality controls, and evaluation protocols provide a benchmark for both strategy-aware generative modeling and human-centered empirical assessment (Liu et al., 2021, Zheng et al., 2022). The subsequent augmentation via AugESC demonstrates the applicability of large-scale synthetic data generation to overcome the coverage limitations inherent in manual curation, yielding substantial scalability and cross-domain generalization without adverse in-domain side effects.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ESConv Dataset.