Mid-Term Memory (MTM) in LLM Systems

Updated 26 April 2026

MTM is an intermediary memory layer that organizes and consolidates episodic and thematic information across tens to hundreds of conversational turns.
Systems like MemoryOS and LightMem use segmentation, embedding-based scoring, and two-stage retrieval to efficiently manage and recall medium-term context.
Empirical evidence shows MTM enhances F1 and BLEU scores while reducing latency, underscoring its impact on personalization and long-horizon reasoning.

Mid-Term Memory (MTM) represents a critical architectural and algorithmic construct in modern hierarchical memory systems for LLM agents. It functions as an intermediary storage layer, bridging the rapid-access but ephemeral Short-Term Memory (STM) and the slowly evolving, knowledge-oriented Long-Term Memory (LTM) or Long-Term Personal Memory (LPM). MTM is specialized for consolidating, organizing, and selectively retrieving episodic or thematic information recurring over medium temporal spans—ranging from tens to hundreds of conversational turns. Its design enables both efficient online retrieval for context-rich interaction and effective memory consolidation for long-horizon personalization and reasoning. Two notable instantiations of MTM are realized in MemoryOS (Kang et al., 30 May 2025) and LightMem (Zhang et al., 9 Apr 2026), each illustrating nuanced strategies for memory segmentation, maintenance, retrieval, and empirical impact.

1. Architectural Roles and Layered Organization

Both MemoryOS and LightMem implement MTM as a semantically structured buffer operating between STM and LTM/LPM. In MemoryOS, memory is hierarchically organized into three tiers: STM—a fixed-size queue (n=7) of recent dialogue "pages" with LLM-generated chain summaries; MTM—a segmented-paging layer capturing semantically coherent topic segments; and LPM—permanent storage for distilled user and agent persona traits. Overflowing pages migrate from STM to MTM via FIFO, while MTM acts as both context buffer and topic aggregator before promotion or eviction to LPM.

LightMem similarly partitions agent memory into STM (session-scoped, non-persistent), MTM (user-scoped episodic store of reusable summaries), and LTM (offline, de-identified graph knowledge distilled from MTM). MTM stores personalized, replay-on-demand memory objects with explicit user isolation.

MTM’s placement and structure enable persistent storage of medium-range conversational evidence, facilitate retrieval for context enhancement, and support incremental knowledge integration into long-term memory.

2. Data Structures and Storage Schemes

MemoryOS adopts a segmented-paging scheme: MTM holds up to Sₘₐₓ=200 segments, each segment a collection of related pages, where each page is a tuple {Q, R, T, metaᶜʰᵃⁱⁿ}—the last field being an LLM-generated semantic summary. Assignment of incoming pages to segments is governed by a scoring function:

$\mathcal{F}_\text{score}(p, s) = \cos(e_s, e_p) + \text{Jaccard}(K_s, K_p)$

where $e_s$ , $e_p$ are embeddings for segment summaries and page content, and $K_s$ , $K_p$ are corresponding keyword sets; threshold $\theta=0.6$ determines segment membership. If all existing segments fall below threshold, a new segment is created for the incoming page. Each segment periodically re-summarizes its content, supporting coherent retrieval.

LightMem encodes each MTM entry as:

$m_i^\text{MTM} = (s_i, t_i, u_i, e_i),\ e_i \in \mathbb{R}^d$

where $s_i$ is the summary (LLM/SLM-generated compressed episode), $t_i$ temporal/access metadata, $u_i$ user-id, and $e_s$ 0 a 384-dim embedding (all-MiniLM-L6-v2). Entries are stored in a FAISS vector index, enabling sub-linear retrieval, strict user isolation, and per-item temporal maintenance. Capacity is bounded (e.g., $e_s$ 1), with redundancy and staleness resolved dynamically.

3. Update, Migration, and Eviction Mechanisms

In MemoryOS, overflow from the STM queue triggers migration to MTM. Within MTM, segment capacity ( $e_s$ 2) is maintained via a "heat"-based priority:

$e_s$ 3

where $e_s$ 4 is total retrievals, $e_s$ 5 is the number of pages, $e_s$ 6 ( $e_s$ 7), all with unit weights. Segments are evicted based on lowest heat if capacity is exceeded. Promotion to LPM is triggered if $e_s$ 8 ( $e_s$ 9), after which $e_p$ 0 is reset to zero.

LightMem’s writer module, controlled by small LLMs, compresses and appends new MTM entries from each interaction, checks for semantic redundancy/conflicts, and enforces the global capacity bound. Redundant entries are merged; contradictory facts are resolved in favor of recency or support. Once MTM grows beyond capacity, low-utility items—determined by frequency or obsolescence—are evicted, and further compression is applied as necessary.

4. Retrieval, Querying, and Online Augmentation

In MemoryOS, retrieval from MTM for a user query $e_p$ 1 follows a two-stage procedure:

Segment selection: Compute $e_p$ 2 for each segment, select top- $e_p$ 3 segments ( $e_p$ 4).
Page selection: Within each chosen segment, retrieve top- $e_p$ 5 pages by semantic cosine similarity ( $e_p$ 6 on LoCoMo; $e_p$ 7 on GVD).

Updates to $e_p$ 8 and $e_p$ 9 are performed post-retrieval. The full retrieval set is:

$K_s$ 0

as the union of STM, selected MTM, and LPM knowledge entries.

LightMem executes retrieval in two stages: an SLM-based controller generates $K_s$ 1 hypothetical queries, each prompting a top- $K_s$ 2 vector similarity search among MTM (budget $K_s$ 3 spread across HQs). Result candidates are semantically re-ranked by another SLM (selector) to select the final $K_s$ 4 results. MTM-retrieved summaries are concatenated with STM for prompt augmentation. Retrieval planning always applies per-user filters ( $K_s$ 5), with HQs and capacity ( $K_s$ 6) divided to optimize coverage and personalization.

5. Interactions with STM, LTM, and Consolidation

MTM serves as the critical intermediate buffer for episodic and thematic context integration. In MemoryOS, migration is strictly FIFO-based at the STM boundary, while promotion to LPM reflects sustained topical intensity and recurrence. The LPM only absorbs highly "hot" segments, preventing information overload and preserving user/agent trait fidelity.

In LightMem, online (interactive) use is wholly handled by STM and MTM. LTM integration is batched and offline: periodically, high-value MTM entries are abstracted, anonymized, and integrated into a cross-user persistent graph. This separation ensures online memory operations remain low-latency and do not block inference. User isolation is enforced at all retrieval and update stages.

6. Empirical Evidence and Practical Impact

Both MemoryOS and LightMem empirically demonstrate that MTM is pivotal for sustained context management and personalization in long-horizon LLM agents.

End-to-end gains (MemoryOS): On LoCoMo with GPT-4o-mini, MemoryOS shows a 49.11% improvement in F1 and 46.18% in BLEU-1 compared to A-Mem. In GVD, accuracy, correctness, and coherence also improve over A-Mem.
MTM ablation (MemoryOS): Removing MTM yields the largest observed drop in long-context metrics (GVD Accuracy from 93.3% to ~88%; LoCoMo F1 drops to ~3rd rank), confirming the non-redundant contribution of this layer.
Efficiency (MemoryOS): The average token recall is ~3,874 with only 4.9 LLM calls per response, outperforming baselines in efficiency.
End-to-end gains (LightMem): LightMem achieves an average F1 improvement of ≈+2.5 on LoCoMo and significant improvements on Multi-hop and Temporal question categories.
Ablation (LightMem): DialSim F1 drops from 4.12 to 3.75 without MTM (~9% relative loss).
Latency (LightMem): Median MTM retrieval takes 83 ms (vs. 856 ms for A-MEM), with full end-to-end inference in 581 ms (vs. 914 ms for A-MEM).
Compression and context: MTM enables longer effective conversational context within bounded compute and retrieval budgets compared to simple retrieval augmentation.

7. Parameterization, Trade-offs, and Tuning Recommendations

Several hyperparameters govern MTM’s segmentation, promotion, and retrieval efficiencies:

MemoryOS parameters: STM queue ( $K_s$ 7), MTM segment cap ( $K_s$ 8), promotion heat threshold ( $K_s$ 9), segment assignment threshold ( $K_p$ 0), retrieval (top- $K_p$ 1 segments, $K_p$ 2 pages per segment). Lowering $K_p$ 3 degrades performance; tuning $K_p$ 4 above 10 yields little marginal benefit.
LightMem parameters: MTM capacity bound ( $K_p$ 5), retrieval budget ( $K_p$ 6), embedding dimension ( $K_p$ 7). Two-stage retrieval ensures both coarse coverage and fine semantic filtering.

A plausible implication is that MTM’s performance is highly sensitive to balancing retrieval precision and memory footprint: excessive segmentation or lax promotion thresholds may bloat latency and degrade relevance, while aggressive eviction or compression can trigger information loss and context discontinuity.

Summary Table: MTM Compared in MemoryOS and LightMem

System	Storage Model	Capacity Parameters	Key Algorithms
MemoryOS	Segmented-paging (topic segments)	STM: 7; MTM: 200; LPM: 100	Heat-based eviction/promotion, segment assignment by joint cosine/Jaccard
LightMem	Vector index of summaries	MTM: 10⁴ entries (per user)	Two-stage SLM-guided retrieval (vector search + semantic filtering), redundancy compression

MTM is thus established as an essential architectural bridge for long-horizon agent memory, providing efficient, personalized, and semantically coherent recall over multi-turn interaction windows while maintaining efficient online latency and high empirical benchmarks in real-scale LLM deployments (Kang et al., 30 May 2025, Zhang et al., 9 Apr 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Memory OS of AI Agent (2025)

Lightweight LLM Agent Memory with Small Language Models (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mid-Term Memory (MTM).