MemoryOS: A Memory Operating System for AI Agents

Updated 17 April 2026

MemoryOS is a system-level framework that applies operating system principles to manage diverse memory resources for AI agents and hybrid DRAM–NVM systems.
It employs a three-tier memory hierarchy (STM, MTM, LPM) with modules for storage, updating, retrieval, and generation to ensure dynamic context integration.
Empirical evaluations reveal enhanced contextual consistency, improved response quality, and significant efficiency gains through proactive data migration and lifelong personalization.

A Memory Operating System (MemoryOS) is a system-level framework for managing memory resources—whether neural, semantic, or physical—using abstractions, hierarchies, and lifecycle controls analogous to those of conventional operating systems, but applied to AI agents and/or hybrid memory hardware. MemoryOS designs span advanced architectures for long-context LLM agents and kernel modules for hybrid DRAM–NVM systems. Prominent implementations in the AI agent domain integrate hierarchical memory stores, operator pipelines, update and retrieval scheduling, and comprehensive mechanisms for lifelong personalization and state retention (Kang et al., 30 May 2025). In the context of hybrid main memory, MemoryOS structures and migrates data across DRAM and NVM tiers via OS-level monitoring, migration, and scheduling policies to optimize latency, power, and endurance (Liu et al., 2017, Garg et al., 2023).

1. Architectural Paradigms and Hierarchical Organization

MemoryOS in LLM-centric agents is structured as a multi-tiered storage and retrieval pipeline. The reference architecture (Kang et al., 30 May 2025) organizes memory into three principal layers:

Short-Term Memory (STM): A fixed-length queue (e.g., $C_{\mathrm{STM}}=7$ ) containing the most recent dialogue "pages" (tuples of user utterance, agent response, and timestamp).
Mid-Term Memory (MTM): A bounded set (e.g., $C_{\mathrm{MTM}}=200$ ) of topic-segmented dialogue chains, where each segment clusters semantically and topically related STM pages using a hybrid embedding and keyword-Jaccard similarity metric:

$\mathcal{F}_\mathrm{score} = \cos(\mathbf{e}_s, \mathbf{e}_p) + \frac{|K_s \cap K_p|}{|K_s \cup K_p|}$

Long-Term Personal Memory (LPM): Compact storage for durable persona and user-specific knowledge, comprising static profiles, factual knowledge bases (size $\le100$ ), and multi-dimensional trait vectors (user: 90-D, agent: up to 100 entries).

This layered architecture enables time/provenance-sensitive memory flows and enables MemoryOS to efficiently span local dialogue consistency, mid-range context, and enduring personal adaptation.

2. Core Functional Modules and Data Flows

MemoryOS comprises four logically distinct, tightly coupled modules:

Storage Module: Physically maintains STM, MTM, and LPM stores. Implements append/group/FIFO semantics and forms the physical substrate over which all operations execute.
Updating Module: Orchestrates intra- and inter-tier migration as new data arrive.
- STM→MTM: On STM overflow, pages are evicted FIFO and inserted into the appropriate MTM segment.
- MTM Eviction: MTM segments are scored for "heat" (a function of access counts, interaction volume, and recency):
$\mathrm{Heat}(s) = \alpha N_{\mathrm{visit}}(s) + \beta L_{\mathrm{interaction}}(s) + \gamma R_{\mathrm{recency}}(s)$

Evict or promote as needed based on retention thresholds. - MTM→LPM: Segments whose heat exceeds threshold $\tau$ are promoted, with persona extraction and trait updates delegated to LLM calls for meta-reflection.
Retrieval Module: Conducts multi-stage retrieval upon query $Q$ $Q$ :
- STM retrieval: Full-chain scan of chained STM pages.
- MTM retrieval: Score and select top- $m$ segments for $Q$ using $\mathcal{F}_\mathrm{score}$ ; within each, embed and pick top- $C_{\mathrm{MTM}}=200$ 0 pages.
- LPM retrieval: Embed $C_{\mathrm{MTM}}=200$ 1 and retrieve high-similarity entries from static/factual and trait stores.
- All results are merged:
$C_{\mathrm{MTM}}=200$ 2
Generation Module: Composes the prompting context by concatenating all relevant STM, MTM, LPM data and passes it, along with $C_{\mathrm{MTM}}=200$ 3, into the LLM for response generation.

This modularity ensures strict separation of storage, migration, context fusion, and neural generation, allies the OS analogy, and supports expanding or specializing any layer with minimal coupling.

3. Dialogue-Loop Integration and Algorithmic Details

The pipeline for each user-agent exchange in MemoryOS is fully specified, ensuring all flows are auditable and reproducible. The main loop performs:

Retrieval: STM context, MTM top- $C_{\mathrm{MTM}}=200$ 4 segments and $C_{\mathrm{MTM}}=200$ 5 pages, LPM top entries for $C_{\mathrm{MTM}}=200$ 6.
Prompt Construction and LLM Call: All information concatenated and passed as prompt.
Storage and Meta-Chain: New response is meta-linked via chain summary, then enqueued to STM.
Housekeeping and Updating: STM overflows are migrated, MTM segments are maintained by heat, and hot segments are promoted to LPM.

This enables prompt-local and persistent memory integration, high retrieval efficiency, and proactive memory lifecycle management. Pseudocode capturing the canonical loop is explicitly available in the source (Kang et al., 30 May 2025).

4. Empirical Evaluation, Personalization, and Ablation

MemoryOS demonstrates substantial gains under the long-horizon LoCoMo benchmark with GPT-4o-mini. Reported metrics using F $C_{\mathrm{MTM}}=200$ 7 and BLEU-1 yield:

Category	F $C_{\mathrm{MTM}}=200$ 8 (Gain)	BLEU-1 (Gain)
Single-Hop	35.27 (+32.4%)	25.22 (+42.3%)
Multi-Hop	41.15 (+23.8%)	30.76 (+5.7%)
Temporal	20.02 (+118.8%)	16.52 (+111.5%)
Open Domain	48.62 (+18.5%)	42.99 (+25.2%)

With average rank across all tasks at 1.0, MemoryOS outperforms baselines (ranks 2.2–5.0), achieves 49.11% average improvement in F $C_{\mathrm{MTM}}=200$ 9, and 46.18% in BLEU-1.

Efficiency is validated via reduced token and LLM call counts: 3,874 tokens and 4.9 LLM calls per turn versus baseline peak of 16,977 tokens/13.0 calls.

Ablation studies demonstrate the critical role of MTM segments (–15–20 F $\mathcal{F}_\mathrm{score} = \cos(\mathbf{e}_s, \mathbf{e}_p) + \frac{|K_s \cap K_p|}{|K_s \cup K_p|}$ 0 on removal), LPM persona (–8 to –12), and STM chain meta (–3 to –5). Disabling the complete MemoryOS reduces performance to near-baseline, verifying the necessity of each component for robust personalization and long-term recall.

5. Personalization, Long-Term Coherence, and Qualitative Behavior

MemoryOS's LPM, encompassing both trait vectors and factual KBs for users/agents, forms the substrate for persistent personalization. Combined with the mid-term segmentation and STM chaining, this enables the agent to:

Recall and contextually reference facts and events from weeks or months earlier.
Adapt responses based on evolving user traits, goals, and prior interactions.
Demonstrate proactive reminders tied to long-standing user objectives, as in recalling fitness goals stemming from historical events.

Such behavior is confirmed in qualitative studies—MemoryOS retrieves, synthesizes, and applies historical user context with high fidelity, a property not observed in flat or stateless RAG systems. The combination of hierarchical storage, chain-based local context, and dynamic persona updating underlies its consistency and adaptability (Kang et al., 30 May 2025).

6. Implementation, Open-Source Ecosystem, and Extensibility

MemoryOS is fully open-sourced (https://github.com/BAI-LAB/MemoryOS) with modular components:

Module	File	Functionality
Storage	`storage.py`	STM, MTM, LPM stores
Updating	`updating.py`	FIFO, heat routines
Retrieval	`retrieval.py`	MTM, LPM retrieval
Generation	`generation.py`	Prompt, LLM API call

The design is amenable to integration with advanced retrieval (embedding retrievers with HiNS (Tian et al., 21 Jan 2026)), further behavioral analytics, or application-specific persona modules. All data flows are strictly modular, allowing for evaluation, modification, and research in isolation or in composition.

7. Broader Context, Variants, and Relation to Other MemoryOS Approaches

The OS-inspired, tiered-memory concept in MemoryOS (Kang et al., 30 May 2025) for LLM agents parallels hierarchical/tiered management of physical memory in hybrid DRAM–NVM architectures (Liu et al., 2017, Garg et al., 2023), though the domains differ. Contemporary variants such as EverMemOS (Hu et al., 5 Jan 2026) introduce engram-inspired memory life-cycles and MemScenes for agentic reasoning, but retain multi-phased segmentation, consolidation, and reconstructive retrieval as organizing principles.

In all domains, the MemoryOS approach unifies the representation, scheduling, and evolution of memory resources in a formally organized fashion, transcending ad-hoc caches or stateless injection to enable robust, contextually appropriate, and scalable management of long-term agent memory and multi-modal information.

In summary, MemoryOS as exemplified in modern LLM agent research delivers a high-performance, auditable, and modular memory stack that bridges conversational and factual context, supports continual user adaptation, and operationalizes classic OS principles for the demands of long-horizon, personalized interaction (Kang et al., 30 May 2025).