Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mid-Term Memory (MTM) in LLM Systems

Updated 26 April 2026
  • MTM is an intermediary memory layer that organizes and consolidates episodic and thematic information across tens to hundreds of conversational turns.
  • Systems like MemoryOS and LightMem use segmentation, embedding-based scoring, and two-stage retrieval to efficiently manage and recall medium-term context.
  • Empirical evidence shows MTM enhances F1 and BLEU scores while reducing latency, underscoring its impact on personalization and long-horizon reasoning.

Mid-Term Memory (MTM) represents a critical architectural and algorithmic construct in modern hierarchical memory systems for LLM agents. It functions as an intermediary storage layer, bridging the rapid-access but ephemeral Short-Term Memory (STM) and the slowly evolving, knowledge-oriented Long-Term Memory (LTM) or Long-Term Personal Memory (LPM). MTM is specialized for consolidating, organizing, and selectively retrieving episodic or thematic information recurring over medium temporal spans—ranging from tens to hundreds of conversational turns. Its design enables both efficient online retrieval for context-rich interaction and effective memory consolidation for long-horizon personalization and reasoning. Two notable instantiations of MTM are realized in MemoryOS (Kang et al., 30 May 2025) and LightMem (Zhang et al., 9 Apr 2026), each illustrating nuanced strategies for memory segmentation, maintenance, retrieval, and empirical impact.

1. Architectural Roles and Layered Organization

Both MemoryOS and LightMem implement MTM as a semantically structured buffer operating between STM and LTM/LPM. In MemoryOS, memory is hierarchically organized into three tiers: STM—a fixed-size queue (n=7) of recent dialogue "pages" with LLM-generated chain summaries; MTM—a segmented-paging layer capturing semantically coherent topic segments; and LPM—permanent storage for distilled user and agent persona traits. Overflowing pages migrate from STM to MTM via FIFO, while MTM acts as both context buffer and topic aggregator before promotion or eviction to LPM.

LightMem similarly partitions agent memory into STM (session-scoped, non-persistent), MTM (user-scoped episodic store of reusable summaries), and LTM (offline, de-identified graph knowledge distilled from MTM). MTM stores personalized, replay-on-demand memory objects with explicit user isolation.

MTM’s placement and structure enable persistent storage of medium-range conversational evidence, facilitate retrieval for context enhancement, and support incremental knowledge integration into long-term memory.

2. Data Structures and Storage Schemes

MemoryOS adopts a segmented-paging scheme: MTM holds up to Sₘₐₓ=200 segments, each segment a collection of related pages, where each page is a tuple {Q, R, T, metaᶜʰᵃⁱⁿ}—the last field being an LLM-generated semantic summary. Assignment of incoming pages to segments is governed by a scoring function:

Fscore(p,s)=cos(es,ep)+Jaccard(Ks,Kp)\mathcal{F}_\text{score}(p, s) = \cos(e_s, e_p) + \text{Jaccard}(K_s, K_p)

where ese_s, epe_p are embeddings for segment summaries and page content, and KsK_s, KpK_p are corresponding keyword sets; threshold θ=0.6\theta=0.6 determines segment membership. If all existing segments fall below threshold, a new segment is created for the incoming page. Each segment periodically re-summarizes its content, supporting coherent retrieval.

LightMem encodes each MTM entry as:

miMTM=(si,ti,ui,ei), eiRdm_i^\text{MTM} = (s_i, t_i, u_i, e_i),\ e_i \in \mathbb{R}^d

where sis_i is the summary (LLM/SLM-generated compressed episode), tit_i temporal/access metadata, uiu_i user-id, and ese_s0 a 384-dim embedding (all-MiniLM-L6-v2). Entries are stored in a FAISS vector index, enabling sub-linear retrieval, strict user isolation, and per-item temporal maintenance. Capacity is bounded (e.g., ese_s1), with redundancy and staleness resolved dynamically.

3. Update, Migration, and Eviction Mechanisms

In MemoryOS, overflow from the STM queue triggers migration to MTM. Within MTM, segment capacity (ese_s2) is maintained via a "heat"-based priority:

ese_s3

where ese_s4 is total retrievals, ese_s5 is the number of pages, ese_s6 (ese_s7), all with unit weights. Segments are evicted based on lowest heat if capacity is exceeded. Promotion to LPM is triggered if ese_s8 (ese_s9), after which epe_p0 is reset to zero.

LightMem’s writer module, controlled by small LLMs, compresses and appends new MTM entries from each interaction, checks for semantic redundancy/conflicts, and enforces the global capacity bound. Redundant entries are merged; contradictory facts are resolved in favor of recency or support. Once MTM grows beyond capacity, low-utility items—determined by frequency or obsolescence—are evicted, and further compression is applied as necessary.

4. Retrieval, Querying, and Online Augmentation

In MemoryOS, retrieval from MTM for a user query epe_p1 follows a two-stage procedure:

  1. Segment selection: Compute epe_p2 for each segment, select top-epe_p3 segments (epe_p4).
  2. Page selection: Within each chosen segment, retrieve top-epe_p5 pages by semantic cosine similarity (epe_p6 on LoCoMo; epe_p7 on GVD).

Updates to epe_p8 and epe_p9 are performed post-retrieval. The full retrieval set is:

KsK_s0

as the union of STM, selected MTM, and LPM knowledge entries.

LightMem executes retrieval in two stages: an SLM-based controller generates KsK_s1 hypothetical queries, each prompting a top-KsK_s2 vector similarity search among MTM (budget KsK_s3 spread across HQs). Result candidates are semantically re-ranked by another SLM (selector) to select the final KsK_s4 results. MTM-retrieved summaries are concatenated with STM for prompt augmentation. Retrieval planning always applies per-user filters (KsK_s5), with HQs and capacity (KsK_s6) divided to optimize coverage and personalization.

5. Interactions with STM, LTM, and Consolidation

MTM serves as the critical intermediate buffer for episodic and thematic context integration. In MemoryOS, migration is strictly FIFO-based at the STM boundary, while promotion to LPM reflects sustained topical intensity and recurrence. The LPM only absorbs highly "hot" segments, preventing information overload and preserving user/agent trait fidelity.

In LightMem, online (interactive) use is wholly handled by STM and MTM. LTM integration is batched and offline: periodically, high-value MTM entries are abstracted, anonymized, and integrated into a cross-user persistent graph. This separation ensures online memory operations remain low-latency and do not block inference. User isolation is enforced at all retrieval and update stages.

6. Empirical Evidence and Practical Impact

Both MemoryOS and LightMem empirically demonstrate that MTM is pivotal for sustained context management and personalization in long-horizon LLM agents.

  • End-to-end gains (MemoryOS): On LoCoMo with GPT-4o-mini, MemoryOS shows a 49.11% improvement in F1 and 46.18% in BLEU-1 compared to A-Mem. In GVD, accuracy, correctness, and coherence also improve over A-Mem.
  • MTM ablation (MemoryOS): Removing MTM yields the largest observed drop in long-context metrics (GVD Accuracy from 93.3% to ~88%; LoCoMo F1 drops to ~3rd rank), confirming the non-redundant contribution of this layer.
  • Efficiency (MemoryOS): The average token recall is ~3,874 with only 4.9 LLM calls per response, outperforming baselines in efficiency.
  • End-to-end gains (LightMem): LightMem achieves an average F1 improvement of ≈+2.5 on LoCoMo and significant improvements on Multi-hop and Temporal question categories.
  • Ablation (LightMem): DialSim F1 drops from 4.12 to 3.75 without MTM (~9% relative loss).
  • Latency (LightMem): Median MTM retrieval takes 83 ms (vs. 856 ms for A-MEM), with full end-to-end inference in 581 ms (vs. 914 ms for A-MEM).
  • Compression and context: MTM enables longer effective conversational context within bounded compute and retrieval budgets compared to simple retrieval augmentation.

7. Parameterization, Trade-offs, and Tuning Recommendations

Several hyperparameters govern MTM’s segmentation, promotion, and retrieval efficiencies:

  • MemoryOS parameters: STM queue (KsK_s7), MTM segment cap (KsK_s8), promotion heat threshold (KsK_s9), segment assignment threshold (KpK_p0), retrieval (top-KpK_p1 segments, KpK_p2 pages per segment). Lowering KpK_p3 degrades performance; tuning KpK_p4 above 10 yields little marginal benefit.
  • LightMem parameters: MTM capacity bound (KpK_p5), retrieval budget (KpK_p6), embedding dimension (KpK_p7). Two-stage retrieval ensures both coarse coverage and fine semantic filtering.

A plausible implication is that MTM’s performance is highly sensitive to balancing retrieval precision and memory footprint: excessive segmentation or lax promotion thresholds may bloat latency and degrade relevance, while aggressive eviction or compression can trigger information loss and context discontinuity.

Summary Table: MTM Compared in MemoryOS and LightMem

System Storage Model Capacity Parameters Key Algorithms
MemoryOS Segmented-paging (topic segments) STM: 7; MTM: 200; LPM: 100 Heat-based eviction/promotion, segment assignment by joint cosine/Jaccard
LightMem Vector index of summaries MTM: 10⁴ entries (per user) Two-stage SLM-guided retrieval (vector search + semantic filtering), redundancy compression

MTM is thus established as an essential architectural bridge for long-horizon agent memory, providing efficient, personalized, and semantically coherent recall over multi-turn interaction windows while maintaining efficient online latency and high empirical benchmarks in real-scale LLM deployments (Kang et al., 30 May 2025, Zhang et al., 9 Apr 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mid-Term Memory (MTM).