Papers
Topics
Authors
Recent
Search
2000 character limit reached

KEEM Dataset: Emotional Memory in Dialogues

Updated 16 January 2026
  • KEEM dataset is a generation-based resource that integrates factual and emotional content from multi-session dialogues.
  • It uses a two-stage annotation protocol with ChatGPT prompting to extract and ground emotion-cause pairs for accurate memory updates.
  • Evaluation shows KEEM achieves lower contradiction rates and enhanced coherence compared to traditional memory update methods in dialogue systems.

The Keep Emotional and Essential Memory (KEEM) dataset is a generation-based resource designed to facilitate effective and nuanced memory updates in long-term open-domain conversational systems. Originating from systematic annotation and reformulation of the Korean Multi-Session Chat (KMSC) corpus, KEEM aims to overcome the limitations of accumulation- and operation-based memory management approaches by generating memory representations that retain both factual and emotional content, explicitly linking user-expressed emotions to their underlying causes. Its memory update paradigm enhances systems’ abilities to maintain user state, coherence, and empathic engagement over extended interactions (Kang et al., 9 Jan 2026).

1. Corpus Construction and Annotation Protocols

KEEM was constructed by processing multi-session Korean dialogues, with each episode comprising two participants (user and system) and segmented into four consecutive sessions. Sample sizes per session count include 2,006 dialogues (sessions 1–2), 1,560 (1–3), and 1,005 (1–4), totaling approximately 60,000–71,000 utterances depending on session grouping. Each dialogue episode is serialized as a JSON object encapsulating full utterance lists, session summaries, and the output of memory-keeping steps.

Annotation in KEEM follows a two-stage protocol. First, annotators extract all emotions and their explicit causes from each session’s raw summary, using few-shot ChatGPT prompting in Korean to rewrite the summary StS_t as StS_t' where each emotion EE is contextually grounded by its cause CC (“I felt down” becomes “I felt down because I had a fight with a club friend”). Manual evaluation employs a scale (0: neither, 1: emotion only, 2: both), and KEEM achieves a 93% rate of “emotion+cause” reflection (compared to 35% in raw KMSC).

Second, the memory update process synthesizes the new memory MtM_t via a generation function gg:

Mt=g(Mt1,St)M_t = g(M_{t-1}, S_t')

gg is realized as a model prompted to avoid hallucination, preserve all non-conflicting facts, and only revise memory when supported by new, causally-anchored summaries. Verification prompts check that MtM_t faithfully reflects all user-relevant facts and emotional trajectories, iteratively backstopping errors and ensuring precision.

2. Memory Update Architecture and Representations

KEEM frames the memory update as a conditional generation problem. At each session tt, given the prior memory Mt1M_{t-1} (sentence list) and the current emotion-reflected summary StS_t', the model generates MtM_t. In potential fine-tuning workflows, the likelihood objective is:

Lgen=tlogPθ(MtMt1,St)L_\text{gen} = -\sum_t \log P_\theta(M_t\mid M_{t-1}, S_t')

Optionally, an auxiliary term LemotionL_\text{emotion} encourages reproduction of annotated emotion–cause pairs, leading to a composite loss: L=Lgen+λLemotionL = L_\text{gen} + \lambda L_\text{emotion}.

Memories in KEEM are lists of discrete Korean sentences; eventual system integration can involve encoding these sentences (e.g., with SBERT) into vectors eiRde_i \in \mathbb{R}^d for retrieval or attention-based memory selection. The update process adheres to strict constraints against fact hallucination and destructive overwriting, updating the representation only when justified by new evidence or causal context. The verification stage employs a prompt-based assessment against the aggregated dialogue up to session tt.

Pseudocode summary:

1
2
3
4
5
6
7
8
9
Initialize M  []
for t in 14:
    D_t  session t dialogue
    S_t  original session summary
    S_t' ← ChatGPT_emotion_reflect(D_t, S_t)
    M_new  ChatGPT_update_prompt(M, S_t')
    if ChatGPT_verify(D_1:t, M_new):
        M  M_new
return M

3. Emotional Context Modeling

Emotional and causal relationships are integral to the KEEM memory format. No external classifier is used; instead, ChatGPT is prompted (in Korean, few-shot, no candidate lists) to extract emotion and rationale directly from the session dialogue, achieving the highest reflection rates under these conditions. The result is a memory where each sentence encodes not just events or facts, but also explicitly links affective states to their precipitating causes (e.g., “I’m happy now that I made up with my club friend”).

Memory generation mandates the inclusion of causal information for every referenced emotion, operationalizing empathic follow-ups in subsequent conversational rounds. Although KEEM does not implement a specialized neural memory-attention mechanism, it is compatible with attention formulations that combine factual and emotional embeddings—at each generation step, the decoder may attend jointly over factual and affective vectors:

c=Attention(Q=decoder_state,K=concat(F,E),V=concat(F,E))c = \text{Attention}(Q = \text{decoder\_state}, K = \text{concat}(F, E), V = \text{concat}(F, E))

where FF are factual and EE are emotional vectors.

4. Causal-Relationship Representation

Causal reasoning in KEEM memory is implicit, realized through the natural-language pairing of each user emotion with its cause. Although KEEM itself does not build explicit graph structures or relational schemas, the dataset structure enables post hoc extraction of (emotion, cause) tuples. A plausible implication is that downstream systems could construct explicit causal graphs GG from KEEM memories, with nodes as emotions/events and directed “because-of” edges, to enable more structured memory retrieval or evidence tracing.

5. Evaluation Methodology and Baseline Comparisons

KEEM’s memory update fidelity is assessed on multiple axes:

  • Emotion & Cause Reflection: Manual scoring (0–2) yields 1.90 for KEEM, significantly higher than 1.18 for KMSC.
  • Memory-Update Accuracy: KEEM achieves 1.67–1.75 average per session.
  • Keyword Coverage: Recall is computed as keywords(M)keywords(D)/keywords(D)| \text{keywords}(M) \cap \text{keywords}(D) | / |\text{keywords}(D)|, with KEEM outperforming both KMSC (accumulation-based) and CareCallmem (operation-based) approaches.
  • Conflict Ratio (via NLI): KEEM and CareCallmem have low contradiction rates (<10%), KMSC is >20%.
  • Conversational Perplexity: KEEM memories yield the lowest next-turn perplexity when conditioning response models (RAG, FiD, FiD-RAG, Llama2, Korean-tuned LLMs).
  • Response Quality Voting: When humans and ChatGPT are asked to choose among responses conditioned on different memory types, KEEM is selected in approximately 75–82% of cases, compared to CareCallmem’s 3–10%.

A summary of approaches is provided in the table below:

Memory Update Method Core Mechanism Conflict Rate Emotion+Cause (avg)
KEEM Generation-based, emotion/cause grounding <10% 1.90
KMSC (Accumulation) Session summary append >20% 1.18
CareCallmem Operation-based (PASS/APPEND/DELETE/REPLACE) <10% Lower than KEEM

Example gains include KEEM’s retention of temporally-lapsed but informative facts (e.g., “traveled to Europe” is not lost when user returns to Korea) and accurate emotional nuance (“I was sad because…”), which typical operation-based memories often omit.

6. Usage and Implementation Guidelines

Integration with conversational systems involves the following steps:

  1. Preprocess multi-session dialogues into session summaries (KMSC format applicable).
  2. Apply KEEM’s emotion/cause reflection prompt to produce StS_t'.
  3. Update memory with Mt1+StMtM_{t-1} + S_t' \to M_t via the prescribed prompt.
  4. Optionally, verify MtM_t using the memory-verification prompt.
  5. Store finalized MtM_t as long-term memory for response generation.

All prompting is executed in Korean (ChatGPT-4.0 API, temperature=0.0, top_p=1, n=1). To train a smaller encoder–decoder model on KEEM, extract ( Mt1M_{t-1}, StS_t' ) → MtM_t triples for cross-entropy training (suggested hyperparameters: learning rate ≈ 3×1053\times10^{-5}, batch size 16, 3–5 epochs, T5-style architecture).

7. Key Insights and Prospective Directions

KEEM demonstrates that generation-based, integrative memory updates minimize contradictions and information loss compared to operation-based paradigms. Explicit encoding of emotion–cause relationships supports context-sensitive, empathic response planning and improves the informativeness and sensitivity of dialogue agents.

Prospective research directions include developing specialized causal or temporal graph-based memory representations; transitioning from prompt-based to fully neural updater models; implementing uncertainty-aware deletions (such as user-driven “forget” requests); and porting the framework to other languages or personalized dialogue domains (e.g., mental health support, intelligent tutoring).

KEEM’s methodology establishes a foundation for more coherent, emotive, and factually consistent long-term conversational memory, with documented gains in both information recall and user-perceived engagement (Kang et al., 9 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Keep Emotional and Essential Memory (KEEM) Dataset.