KEEM Dataset: Emotional Memory in Dialogues

Updated 16 January 2026

KEEM dataset is a generation-based resource that integrates factual and emotional content from multi-session dialogues.
It uses a two-stage annotation protocol with ChatGPT prompting to extract and ground emotion-cause pairs for accurate memory updates.
Evaluation shows KEEM achieves lower contradiction rates and enhanced coherence compared to traditional memory update methods in dialogue systems.

The Keep Emotional and Essential Memory (KEEM) dataset is a generation-based resource designed to facilitate effective and nuanced memory updates in long-term open-domain conversational systems. Originating from systematic annotation and reformulation of the Korean Multi-Session Chat (KMSC) corpus, KEEM aims to overcome the limitations of accumulation- and operation-based memory management approaches by generating memory representations that retain both factual and emotional content, explicitly linking user-expressed emotions to their underlying causes. Its memory update paradigm enhances systems’ abilities to maintain user state, coherence, and empathic engagement over extended interactions (Kang et al., 9 Jan 2026).

1. Corpus Construction and Annotation Protocols

KEEM was constructed by processing multi-session Korean dialogues, with each episode comprising two participants (user and system) and segmented into four consecutive sessions. Sample sizes per session count include 2,006 dialogues (sessions 1–2), 1,560 (1–3), and 1,005 (1–4), totaling approximately 60,000–71,000 utterances depending on session grouping. Each dialogue episode is serialized as a JSON object encapsulating full utterance lists, session summaries, and the output of memory-keeping steps.

Annotation in KEEM follows a two-stage protocol. First, annotators extract all emotions and their explicit causes from each session’s raw summary, using few-shot ChatGPT prompting in Korean to rewrite the summary $S_t$ as $S_t'$ where each emotion $E$ is contextually grounded by its cause $C$ (“I felt down” becomes “I felt down because I had a fight with a club friend”). Manual evaluation employs a scale (0: neither, 1: emotion only, 2: both), and KEEM achieves a 93% rate of “emotion+cause” reflection (compared to 35% in raw KMSC).

Second, the memory update process synthesizes the new memory $M_t$ via a generation function $g$ :

$M_t = g(M_{t-1}, S_t')$

$g$ is realized as a model prompted to avoid hallucination, preserve all non-conflicting facts, and only revise memory when supported by new, causally-anchored summaries. Verification prompts check that $M_t$ faithfully reflects all user-relevant facts and emotional trajectories, iteratively backstopping errors and ensuring precision.

2. Memory Update Architecture and Representations

KEEM frames the memory update as a conditional generation problem. At each session $t$ , given the prior memory $M_{t-1}$ (sentence list) and the current emotion-reflected summary $S_t'$ , the model generates $M_t$ . In potential fine-tuning workflows, the likelihood objective is:

$L_\text{gen} = -\sum_t \log P_\theta(M_t\mid M_{t-1}, S_t')$

Optionally, an auxiliary term $L_\text{emotion}$ encourages reproduction of annotated emotion–cause pairs, leading to a composite loss: $L = L_\text{gen} + \lambda L_\text{emotion}$ .

Memories in KEEM are lists of discrete Korean sentences; eventual system integration can involve encoding these sentences (e.g., with SBERT) into vectors $e_i \in \mathbb{R}^d$ for retrieval or attention-based memory selection. The update process adheres to strict constraints against fact hallucination and destructive overwriting, updating the representation only when justified by new evidence or causal context. The verification stage employs a prompt-based assessment against the aggregated dialogue up to session $t$ .

Pseudocode summary:

Initialize M ← []
for t in 1…4:
    D_t ← session t dialogue
    S_t ← original session summary
    S_t' ← ChatGPT_emotion_reflect(D_t, S_t)
    M_new ← ChatGPT_update_prompt(M, S_t')
    if ChatGPT_verify(D_1:t, M_new):
        M ← M_new
return M

3. Emotional Context Modeling

Emotional and causal relationships are integral to the KEEM memory format. No external classifier is used; instead, ChatGPT is prompted (in Korean, few-shot, no candidate lists) to extract emotion and rationale directly from the session dialogue, achieving the highest reflection rates under these conditions. The result is a memory where each sentence encodes not just events or facts, but also explicitly links affective states to their precipitating causes (e.g., “I’m happy now that I made up with my club friend”).

Memory generation mandates the inclusion of causal information for every referenced emotion, operationalizing empathic follow-ups in subsequent conversational rounds. Although KEEM does not implement a specialized neural memory-attention mechanism, it is compatible with attention formulations that combine factual and emotional embeddings—at each generation step, the decoder may attend jointly over factual and affective vectors:

$c = \text{Attention}(Q = \text{decoder\_state}, K = \text{concat}(F, E), V = \text{concat}(F, E))$

where $F$ are factual and $E$ are emotional vectors.

4. Causal-Relationship Representation

Causal reasoning in KEEM memory is implicit, realized through the natural-language pairing of each user emotion with its cause. Although KEEM itself does not build explicit graph structures or relational schemas, the dataset structure enables post hoc extraction of (emotion, cause) tuples. A plausible implication is that downstream systems could construct explicit causal graphs $G$ from KEEM memories, with nodes as emotions/events and directed “because-of” edges, to enable more structured memory retrieval or evidence tracing.

5. Evaluation Methodology and Baseline Comparisons

KEEM’s memory update fidelity is assessed on multiple axes:

Emotion & Cause Reflection: Manual scoring (0–2) yields 1.90 for KEEM, significantly higher than 1.18 for KMSC.
Memory-Update Accuracy: KEEM achieves 1.67–1.75 average per session.
Keyword Coverage: Recall is computed as $| \text{keywords}(M) \cap \text{keywords}(D) | / |\text{keywords}(D)|$ , with KEEM outperforming both KMSC (accumulation-based) and CareCallmem (operation-based) approaches.
Conflict Ratio (via NLI): KEEM and CareCallmem have low contradiction rates (<10%), KMSC is >20%.
Conversational Perplexity: KEEM memories yield the lowest next-turn perplexity when conditioning response models (RAG, FiD, FiD-RAG, Llama2, Korean-tuned LLMs).
Response Quality Voting: When humans and ChatGPT are asked to choose among responses conditioned on different memory types, KEEM is selected in approximately 75–82% of cases, compared to CareCallmem’s 3–10%.

A summary of approaches is provided in the table below:

Memory Update Method	Core Mechanism	Conflict Rate	Emotion+Cause (avg)
KEEM	Generation-based, emotion/cause grounding	<10%	1.90
KMSC (Accumulation)	Session summary append	>20%	1.18
CareCallmem	Operation-based (PASS/APPEND/DELETE/REPLACE)	<10%	Lower than KEEM

Example gains include KEEM’s retention of temporally-lapsed but informative facts (e.g., “traveled to Europe” is not lost when user returns to Korea) and accurate emotional nuance (“I was sad because…”), which typical operation-based memories often omit.

6. Usage and Implementation Guidelines

Integration with conversational systems involves the following steps:

Preprocess multi-session dialogues into session summaries (KMSC format applicable).
Apply KEEM’s emotion/cause reflection prompt to produce $S_t'$ .
Update memory with $M_{t-1} + S_t' \to M_t$ via the prescribed prompt.
Optionally, verify $M_t$ using the memory-verification prompt.
Store finalized $M_t$ as long-term memory for response generation.

All prompting is executed in Korean (ChatGPT-4.0 API, temperature=0.0, top_p=1, n=1). To train a smaller encoder–decoder model on KEEM, extract ( $M_{t-1}$ , $S_t'$ ) → $M_t$ triples for cross-entropy training (suggested hyperparameters: learning rate ≈ $3\times10^{-5}$ , batch size 16, 3–5 epochs, T5-style architecture).

7. Key Insights and Prospective Directions

KEEM demonstrates that generation-based, integrative memory updates minimize contradictions and information loss compared to operation-based paradigms. Explicit encoding of emotion–cause relationships supports context-sensitive, empathic response planning and improves the informativeness and sensitivity of dialogue agents.

Prospective research directions include developing specialized causal or temporal graph-based memory representations; transitioning from prompt-based to fully neural updater models; implementing uncertainty-aware deletions (such as user-driven “forget” requests); and porting the framework to other languages or personalized dialogue domains (e.g., mental health support, intelligent tutoring).

KEEM’s methodology establishes a foundation for more coherent, emotive, and factually consistent long-term conversational memory, with documented gains in both information recall and user-perceived engagement (Kang et al., 9 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Generation-Based and Emotion-Reflected Memory Update: Creating the KEEM Dataset for Better Long-Term Conversation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Keep Emotional and Essential Memory (KEEM) Dataset.

KEEM Dataset: Emotional Memory in Dialogues

1. Corpus Construction and Annotation Protocols

2. Memory Update Architecture and Representations

3. Emotional Context Modeling

4. Causal-Relationship Representation

5. Evaluation Methodology and Baseline Comparisons

6. Usage and Implementation Guidelines

7. Key Insights and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

KEEM Dataset: Emotional Memory in Dialogues

1. Corpus Construction and Annotation Protocols

2. Memory Update Architecture and Representations

3. Emotional Context Modeling

4. Causal-Relationship Representation

5. Evaluation Methodology and Baseline Comparisons

6. Usage and Implementation Guidelines

7. Key Insights and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research