EvoEmo: Emotion-Aware Conversational AI
- EvoEmo is an emotion-aware framework that integrates evolving emotional policies for strategic negotiation and long-term support in conversational AI.
- It employs evolutionary reinforcement learning and MDP formalization to optimize dynamic emotional trajectories, improving negotiation success and efficiency.
- It introduces a multi-session dataset for benchmarking personalized emotional support, emphasizing memory, context, and retrieval-augmented generation.
EvoEmo refers to two distinct yet thematically aligned contributions in the domain of emotion-aware conversational AI: (1) a framework for evolving strategic emotional policies in multi-turn negotiation with LLM agents (Long et al., 4 Sep 2025), and (2) a multi-session dataset for benchmarking personalized long-term emotional support in dialogue agents (Chen et al., 2 Feb 2026). Both advance the modeling, optimization, and evaluation of emotion as a central component in agentic AI, differing principally in task context (negotiation vs. support) and methodology (evolutionary policy search vs. corpus design).
1. Emotional Policy Optimization in Multi-Turn Negotiation
EvoEmo, in the context of negotiation, addresses the shortcomings of contemporary LLM agents whose emotional outputs are dominated by passive or static affective cues. These agents typically rely on preference-driven or neutral persona settings, resulting in predictable, easily exploitable negotiation patterns and suboptimal outcomes—manifesting as lower agreement rates, longer dialogues, and unfavorable concessions. EvoEmo approaches emotion not merely as a byproduct or surface trait but as a first-class, temporally-aligned, and strategically adaptive signal integral to negotiation success.
This functional perspective is grounded in behavioral research, which establishes that human bargaining proficiency relies on: (1) temporal alignment—emotion fluctuates turn by turn; (2) strategic flexibility—emotional states are modulated tactically to influence perception and concession; and (3) interactive amplification—emotional contagion dynamically alters counterpart behavior. Systematic dynamic emotion modulation enables agents to exploit these mechanisms for tactical anchoring, rapport building, and opponent adaptation, thus amplifying the functional importance of emotion in negotiation (Long et al., 4 Sep 2025).
2. Markov Decision Process Formalization of Emotional Dynamics
EvoEmo’s framing of negotiation as a Markov Decision Process (MDP) enables formal optimization of dynamic emotional trajectories. The buyer’s policy is defined by , with the following components:
- State Space (): Each state encodes the turn index (), current emotion (from a 7-dimensional basic emotion set), and price context (current offer and bid history).
- Action Space (): All possible LLM-generated text responses, each implicitly conveying emotion.
- Transition Function (): The next emotion is sampled from a learned policy:
with price transitions following the opponent’s move. The LLM’s output is thus a function of the current state and a controllable temperature parameter.
- Reward Function (): Reward is terminal and given only on successful agreement:
where is the normalized buyer savings, is dialogue length, and is a weight to balance savings and efficiency.
- Discount Factor (): Though not explicitly used, an implicit is reasonable for prioritizing earlier success if re-cast as standard RL.
This formalism uniquely supports sequence-level optimization of dynamic emotions, integrating them directly into the policy’s action and state space (Long et al., 4 Sep 2025).
3. Evolutionary Reinforcement Learning of Emotion Policies
EvoEmo’s core methodological innovation is the use of a genetic algorithm (GA) to evolve high-reward emotional policies over diverse negotiation scenarios. Each candidate policy consists of an emotion sequence , temperature parameters controlling response stochasticity, and a transition matrix stipulating emotion state transitions.
The evolutionary cycle comprises:
- Initialization: Sampling random policies.
- Fitness Evaluation: Simulating multi-turn negotiation episodes for each policy; computing terminal rewards.
- Selection: Fitness-proportional parent selection via softmax over rewards, with elitism preserving top policies.
- Crossover: Uniform mixing of parent parameters with probability .
- Mutation: Perturbing transition matrices or flipping emotional states with probability .
- Replacement: Forming the next generation; repeat over generations.
This process allows for both exploration of novel emotion strategies and exploitation of established high-performing trajectories.
Genetic Algorithm Pseudocode (excerpt)
1 2 3 4 5 6 7 8 |
For g = 1…G:
Evaluate rewards {R^i} for all π^i in population P_g
Select parents via softmax over R^i
Apply crossover (rate p_c) and mutation (rate p_m) to generate offspring
Preserve top ρ policies (elitism)
P_{g+1} ← offspring ∪ elites
End
Return best π in P_G |
4. Experimental Evaluation, Baseline Comparison, and Ablations
EvoEmo’s efficacy is demonstrated in multi-turn negotiation scenarios derived from CraigslistBargain. The framework’s evaluation spans 20 bargaining cases (electronics, furniture, vehicles, housing, $50–$5,000 price range) with realistic seller cost and human-annotated emotions. Nine buyer–seller LLM pairings are tested (three buyers: emotion-controlled, three vanilla sellers).
Baselines:
- Vanilla: No emotional instructions.
- Fixed-emotion: Buyer remains in a single emotion (e.g., always angry or neutral).
Metrics:
- Success Rate: Agreement percentage.
- Buyer Savings:
- Efficiency: Number of dialogue turns.
| Agent | Success Rate (%) | Buyer Savings (%) | Turns (Efficiency) |
|---|---|---|---|
| EvoEmo | ≈ 100 | 41–42 | 6 |
| Vanilla | ≈ 90 | 30–35 | 9 |
| Fixed-neg/pos | 80–95 | 25–38 | 8–11 |
Ablation studies highlight that ratio-based rewards (savings per turn) yield 22.7% more efficiency (5.8 vs. 7.5 turns) over weighted sums; moderate temperature () optimizes savings and speed; and policy convergence emerges by the fifth generation (41.7% savings, 6 turns) (Long et al., 4 Sep 2025).
5. EvoEmo Dataset for Long-Term Emotional Support
A separate, complementary contribution is the EvoEmo dataset, introduced to benchmark memory and emotional modeling in long-term emotional support agents (Chen et al., 2 Feb 2026). EvoEmo is the first emotional-support corpus designed for fragmented, multi-session user disclosure and evolving user states over multi-month periods.
Distinctive features:
- Multi-session: Up to 33 sessions/user; average span 15 months.
- Event timeline: Each user has an explicit, time-stamped progression of life events.
- Fragmented and implicit disclosure: Critical facts emerge across sessions rather than being re-stated; agent inference and memory are critical.
- Synthetic-human reviewed: Seeded from ESConv dialogues, expanded with GPT-4o generation and multidomain annotator consistency checks.
Dataset statistics:
| Statistic | Value |
|---|---|
| Unique Users | 18 |
| Sessions | 401 |
| Avg. Sessions/User | 22.3 |
| Avg. Turns/Session | 23.4 |
| Avg. Tokens/Session | 596.6 |
| Avg. Span/User | 14.9 months |
Dialogue topics are imbalanced but dominated by emotion/mood and career/study.
Session schema includes user and session IDs, timestamps, associated event, evolving user profile snapshot, turn-wise dialogue with emotional state/disclosure annotation, and summary/observation metadata.
The corpus supports tasks in QA (factual and user modeling), summarization (cross-session abstraction), and dialogue generation (long-term consistency and personalization), evaluated via F1, BERTScore, ROUGE, Recall@n, nDCG@n, and LLM-as-Judge ratings (Chen et al., 2 Feb 2026).
6. Integration Strategies and Challenges in Emotional Modeling
EvoEmo highlights the need for explicit memory modules and retrieval-augmented generation architectures. Top-performing agents index sessions, apply dense retrieval over session embeddings (e.g., bge-m3), and fine-tune on summarization and QA objectives to enhance temporal reasoning and adaptive user modeling. Specific integration practices include tuning k (sessions fetched per query, optimal ) and balancing retrieval granularity.
Challenges identified:
- Hallucinations: Lack of explicit memory results in user fact invention.
- Context length: Long user dialogue histories challenge smaller LLMs; segmenting and hybrid architectures mitigate degradation.
- Synthetic dataset bounds: Transfer to real-world data may require additional domain adaptation.
- Implicit user disclosure: Models must identify subtle, context-dependent clues beyond explicit statements.
This suggests future dialogue agents must incorporate explicit, modular memory with calibration for both fact recall and abstention behaviors (Chen et al., 2 Feb 2026).
7. Limitations and Future Research Directions
EvoEmo’s negotiation framework currently leverages a 7-state discrete emotional model; extending to compound or continuous spectra remains unexplored. Its generalization has been demonstrated only in commercial bargaining—applicability in legal, high-stakes, or crisis negotiation is untested. The evolutionary approach yields strong empirical gains but results in black-box policies whose internal logic is difficult to interpret, indicating a need for post-hoc explainability methods. Current experiments are restricted to LLM–LLM interaction; human–AI and real-world validation are open areas. Lastly, the strategic use of emotion for adversarial or manipulative purposes raises concerns regarding ethics, safety, and the design of guardrails (Long et al., 4 Sep 2025).
Regarding the support dataset, although EvoEmo is heavily human-reviewed, its synthetic foundation and implicit event annotation suggest potential limitations in ecological validity—transferring models trained on EvoEmo to purely real-world data warrants systematic study. Robust approaches to implicit disclosure extraction, longitudinal memory tracking, and adversarial robustness are priority avenues for future research (Chen et al., 2 Feb 2026).