PLATO-LTM: Enhancing Dialogue with Long-Term Memory
- PLATO-LTM is a dialogue generation framework that integrates dynamic persona memory into PLATO-2 for enhanced long-term conversation.
- It employs a Persona Extractor, Long-Term Memory module, and transformer-based Generation Module to maintain role-specific persona consistency.
- Empirical results on the DuLeMon corpus demonstrate significant improvements in persona consistency and engagingness through both automatic and human evaluations.
PLATO-LTM is a dialogue generation framework that augments a large pre-trained chatbot (PLATO-2) with a plug-and-play Long-Term Memory (LTM) mechanism. Designed to address the deficiencies of open-domain dialogue models in long-term human-bot conversations, PLATO-LTM dynamically manages persona information for both user and system, enabling enhanced persona consistency and engagingness over extended multi-turn interactions (Xu et al., 2022).
1. System Architecture
PLATO-LTM architecture comprises three principal components: the Persona Extractor (PE), the Long-Term Memory module (LTM), and the Generation Module. The PE is an ERNIE-CNN classifier operating at the clause level, identifying utterance spans that express persona-related information. LTM maintains distinct, dynamically updated memory stores for user and bot persona, supporting real-time "write" (memory update) and "read" (retrieval) via dense embedding similarity. The Generation Module is a PLATO-2 transformer-based model that conditions on conversation history and persona facts, distinguished by role-specific embeddings and tokens.
At runtime, the sequence is as follows: (1) newly received utterances are parsed by the PE for persona content, (2) LTM "writes" these to the corresponding memory (user/bot), (3) on each generation step, LTM "reads" the top-k relevant persona entries for each role conditioned on the current context, (4) the generator produces the system's response, utilizing both dialog context and retrieved persona content with explicit role signaling.
2. Long-Term Memory Mechanism
Each persona fact is represented as , storing both the text and its dense vector from the persona encoder (ERNIE-initialized). Context is encoded via , also ERNIE-based, using pooled [CLS] tokens. Retrieval is performed by ranking persona entries according to cosine similarity , filtering below threshold , and returning the top per role (). Write operations employ a deduplication heuristic: a new candidate replaces the most similar existing fact if , else it is appended. Role tagging ensures strict separation of user and bot persona stores.
Persona and context encoders are jointly optimized via a triplet ranking loss:
where is an actual-used persona in the current turn, is a random negative, and .
3. Data: The DuLeMon Corpus
DuLeMon, a large-scale Chinese, multi-turn mutual persona corpus, serves as the training and evaluation backbone for PLATO-LTM. It features two splits:
- DuLeMon-SELF: Bot only knows its own persona (24,500 dialogues; 400,472 utterances; avg. 16.3 turns; 4.0 bot persona; 4.0 user persona unseen).
- DuLeMon-BOTH: Bot also pre-knows part of the user’s persona (3,001 dialogues; 48,522 utterances; avg. 16.2 turns; 4.0 bot persona; 4.4 user persona seen; 1.3 unseen).
The dataset construction pipeline involves (1) pooling and translating/re-writing PersonaChat personas, (2) worker persona assignment, (3) dialogue authoring (promoting on-topic persona-rich discourse without copying), and (4) fine-grained annotation of persona usage and sentence boundaries at response level.
4. Training and Optimization
The PE is trained via a two-stage process, starting with 6,000 human-annotated utterances, yielding five pc-stage1 models, followed by large-scale (1.4M) auto-labeling via model consensus, and retraining five pc-stage2 models to select the best (F1=0.91). Context/persona matching employ the described triplet loss (α=0.2). The generator leverages the underlying PLATO-2 architecture, with both uni- and bi-directional transformer variants. Key implementation details include context length cap (384 tokens), user persona (76), and bot persona (52), with role signals incorporated both as token prepends and learned embeddings. Optimization is performed with Adam (lr=), batch size ≈16,384 tokens, two scale variants (12-layer and 32-layer), and unconstrained memory capacity.
5. Evaluation Protocol and Results
Automatic metrics include persona extraction F1 (stage2: 0.91), memory retrieval AUC (0.76), recall@5 (0.83), and comprehensive generation metrics (perplexity, BLEU, token F1, DISTINCT). Notably, best performance (PLATO-FT 32L + role signals) achieves PPL=9.38, BLEU-1/2=(0.194, 0.087), DISTINCT-1/2=(0.068, 0.296), token F1=22.61.
Human evaluation employs self-chat between PLATO-LTM and a user simulator spanning 10 episodes × 4 sessions × 16 turns. Ratings (0–2 scale) cover coherence, consistency, and engagingness:
| Model | Coherence | Consistency | Engagingness |
|---|---|---|---|
| PLATO-2 (baseline) | 1.70 | 0.13 | 1.46 |
| PLATO-FT | 1.59 | 0.40 | 1.40 |
| PLATO-LTM | 1.67 | 0.87 | 1.54 |
| PLATO-LTM w/o PE | 1.57 | 0.49 | 1.43 |
Empirically, inclusion of LTM yields a 0.47 absolute boost in persona consistency, with the persona extractor further improving both consistency and engagingness. Fine-tuning alone (PLATO-FT) slightly reduces coherence under open-topic conditions.
6. Qualitative Behavior and Analysis
PLATO-LTM demonstrates multi-session, long-span persona tracking, exemplified by retention and reuse of facts mentioned several sessions prior (e.g., recalling and referencing the user's affinity for basketball). System persona is equally well-maintained across conversational topics (e.g., consistently presenting "I’m a landscape painter"). In contrast, the baseline PLATO-2 exhibits frequent drift to generic topics and neglects prior persona facts. This behavior illustrates the effectiveness of explicit persona memory with role-conditioned context integration.
7. Limitations and Prospective Extensions
The current LTM mechanism is restricted to explicitly extractable persona sentences; subtle or implicit persona traits remain unaddressed. Absence of memory capacity controls permits unbounded growth during extended interactions. Fine-tuning on small, out-of-domain datasets can introduce slight degradation in overall dialogue coherence. The potential application of reinforcement learning from human feedback is proposed as a future direction to optimize persona memory management and further enhance long-term dialogue engagingness (Xu et al., 2022). A plausible implication is that broader forms of persona representation and more advanced storage management methods may yield further gains in robustness and user experience for open-domain conversational agents.