Papers
Topics
Authors
Recent
Search
2000 character limit reached

MultiESC: Multi-turn Emotional Support Framework

Updated 6 February 2026
  • MultiESC is a multi-turn emotional support framework that integrates A*-like lookahead planning, dynamic user-state tracking, and strategy-aware response generation.
  • It employs a modular architecture with dialogue encoding, emotion-cause extraction, strategy planning, and conditioned response decoding to optimize long-term user well-being.
  • Empirical evaluations on the ESConv dataset show MultiESC outperforms baselines, notably increasing CIDEr scores and improving human-assessed dialogue quality.

MultiESC refers to a distinctive framework for multi-turn Emotional Support Conversation (ESC), targeting the goal of maximizing user well-being over extended dialogues. It formalizes multi-step strategy planning for emotional support agents by integrating lookahead planning, user-state tracking, and strategy-conditioned response generation within a unified neural architecture (Cheng et al., 2022).

1. Motivation and Overview

Multi-turn ESC requires agents to engage in sustained, context-aware support beyond single-turn empathy exchange. Key technical challenges are (i) selecting appropriate support strategies over a prolonged dialogue horizon to maximize cumulative user relief, and (ii) dynamically modeling evolving user states—capturing shifts in emotion intensity and identifying the underlying causes of distress. MultiESC addresses these by operationalizing a lookahead strategy planner (A*-like search), a fine-grained emotion-cause user-state encoder, and a strategy-aware response decoder. This architecture is designed to optimize for long-term, not just immediate, conversational outcomes.

2. System Architecture

MultiESC is a modular framework, processing each turn tt in four sequential stages:

  • Dialogue Encoder: A Transformer encoder aggregates the NN most recent tokens from conversation history Ht\mathcal{H}_t into hidden state Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}.
  • User-State Encoder: Each user utterance yiy_i is processed to extract emotion-cause spans cic_i via an external detector (e.g., RECCON). Tokens in [xi][yi][ci][x_i][y_i][c_i] are mapped via the sum of word, positional, and emotion (VAD-quantized) embeddings. Another Transformer produces per-round user-state vectors ui\mathbf{u}_i, with cumulative state matrix Ut=[u1;… ;ut−1]\mathbf{U}_t = [\mathbf{u}_1; \dots; \mathbf{u}_{t-1}].
  • Strategy Planning Module: For the current state, MultiESC scores all candidate support strategies st∈Ss_t \in \mathcal{S} by maximizing a composite function NN0 that integrates immediate fit and estimated future utility.
  • Utterance Decoder: A strategy-conditioned Transformer decoder generates the response NN1, informed by NN2, NN3, and NN4.

The architecture enables joint optimization and interoperability between strategy planning, user modeling, and controlled text generation.

3. Lookahead Strategy Planning

Inspired by A* search, MultiESC's core decision process computes

NN5

where NN6 is negative log-probability from a Strategy Sequence Generator (SSG), and NN7 is a heuristic for expected user feedback after executing NN8. Hyperparameter NN9 (set to 0.7) modulates trajectory bias.

The ideal Ht\mathcal{H}_t0 would marginalize over all possible future support strategies Ht\mathcal{H}_t1: Ht\mathcal{H}_t2 In practice, MultiESC restricts the lookahead to Ht\mathcal{H}_t3 turns (set to 2), considers top-Ht\mathcal{H}_t4 probable continuations via beam search, and computes Ht\mathcal{H}_t5 as a weighted sum over these: Ht\mathcal{H}_t6 The SSG is a masked Transformer decoder with multi-source cross-attention. The User-Feedback Predictor (UFP) encodes candidate strategy sequences and user-state histories via a Transformer and LSTM, aggregates via an attention mechanism, and produces feedback scores for heuristic estimation.

4. Dynamic User-State Modeling

User state is encoded as follows:

  • An emotion-cause extractor detects the text spans Ht\mathcal{H}_t7 that trigger expressed emotions.
  • Each token is embedded as Ht\mathcal{H}_t8, with Ht\mathcal{H}_t9 assigned by VAD lexicon binning.
  • A Transformer encodes this representation, and the resulting Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}0 vector Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}1 is treated as the turn-level user state.

The cumulative user state matrix supports both immediate context and long-range emotion consistency. Emotion-cause tracking enables the system to distinguish between surface affect and the underlying drivers of user distress.

5. Response Generation and Training

The response decoder is a strategy-conditioned Transformer, receiving strategy embeddings Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}2 as prepended token vectors. The architecture is identical to that of the SSG to promote information sharing. Training involves:

  • Joint training of SSG and decoder, optimizing Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}3.
  • Separate training of the UFP on mean squared error loss Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}4.

Key hyperparameters include Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}5 (lookahead), Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}6 (beam width), Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}7 (embedding/hidden dimension), batch size 32, Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}8 optimizer at Ht∈RN×demb\mathbf{H}_t \in \mathbb{R}^{N\times d_{\text{emb}}}9.

6. Evaluation and Comparative Results

MultiESC is evaluated on the ESConv test set with both automatic metrics and human interaction studies.

Automatic Dialogue Metrics

Model PPL↓ BLEU-4↑ ROUGE-L↑ METEOR↑ CIDEr↑
BlenderBot-Joint 16.8 1.66 17.94 7.54 18.04
GLHG 15.7 2.13 16.37 – –
MultiESC 15.4 3.09 20.41 8.84 29.98

MultiESC outperforms baselines (notably BlenderBot-Joint) on all major metrics, especially CIDEr (+11.9 over BlenderBot-Joint).

Strategy Planning and Feedback

Model Acc↑ W-F1↑ Feedback↑
BlenderBot-Joint 29.9% 29.6 3.05
MISC 31.6% – –
MultiESC 42.0% 34.0 3.85

MultiESC achieves 10.1 percentage points improvement in top-1 accuracy over MISC and improves predicted feedback by +0.80.

Human Interactive Evaluation

In 128 role-played dialogues, MultiESC demonstrates higher win rates on all dimensions (fluency, empathy, identification, suggestion, overall effectiveness), with an overall win rate of 58.6% vs. BlenderBot-Joint.

A case study reveals that lookahead planning can bias strategy choice from generic empathy or premature advice to more context-seeking behaviors (e.g., selecting "Question" strategy before issuing advice), which aligns with counseling best practices.

7. Implications and Significance

MultiESC establishes a paradigm for multi-turn emotional support systems that integrates explicit lookahead planning and fine-grained user-state modeling within Transformer-based architectures. The explicit incorporation of A*-like planning heuristics enables more effective, contextually grounded strategy selection, yielding improved dialogue coherence and support efficacy. The emotional-cause user-state encoding offers a mechanism for granular, cause-aware empathetic response, advancing the state of the art in emotionally intelligent dialogue systems.

The technical contributions are broadly applicable to domains requiring long-term dialogue objectives and fine-grained user modeling, including counseling, social chatbots, and assistive technology. MultiESC's empirical results substantiate the claim that long-term planning with user feedback estimation can materially enhance both quantitative and qualitative support effectiveness (Cheng et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MultiESC.