Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 78 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 28 tok/s Pro

GPT-5 High 28 tok/s Pro

GPT-4o 80 tok/s Pro

Kimi K2 127 tok/s Pro

GPT OSS 120B 471 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

MemOS: A Memory OS for AI System (2507.03724v3)

Published 4 Jul 2025 in cs.CL

Abstract: LLMs have become an essential infrastructure for AGI, yet their lack of well-defined memory management systems hinders the development of long-context reasoning, continual personalization, and knowledge consistency.Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.While Retrieval-Augmented Generation (RAG) introduces external knowledge in plain text, it remains a stateless workaround without lifecycle control or integration with persistent representations.Recent work has modeled the training and inference cost of LLMs from a memory hierarchy perspective, showing that introducing an explicit memory layer between parameter memory and external retrieval can substantially reduce these costs by externalizing specific knowledge. Beyond computational efficiency, LLMs face broader challenges arising from how information is distributed over time and context, requiring systems capable of managing heterogeneous knowledge spanning different temporal scales and sources. To address this challenge, we propose MemOS, a memory operating system that treats memory as a manageable system resource. It unifies the representation, scheduling, and evolution of plaintext, activation-based, and parameter-level memories, enabling cost-efficient storage and retrieval. As the basic unit, a MemCube encapsulates both memory content and metadata such as provenance and versioning. MemCubes can be composed, migrated, and fused over time, enabling flexible transitions between memory types and bridging retrieval with parameter-based learning. MemOS establishes a memory-centric system framework that brings controllability, plasticity, and evolvability to LLMs, laying the foundation for continual learning and personalized modeling.

Collections

Summary

The paper introduces MemOS for LLMs by integrating plaintext, activation, and parameter memories to address static memory challenges.
It employs a layered architecture using MemCube, MemScheduler, and MemOperator to optimize memory lifecycle management and cross-task adaptability.
Evaluations on the LOCOMO benchmark show that MemOS achieves superior reasoning performance with low latency and robust security.

A Memory OS for AI Systems: MemOS

The paper "MemOS: A Memory OS for AI System" introduces MemOS, a novel operating system designed specifically to address memory management in LLMs. MemOS emerges as a critical infrastructure aiming to enhance the reasoning capabilities and adaptive behavior of LLMs by integrating a systematic mechanism for managing diverse memory types. This approach is segmented into key memory categories, namely Plaintext, Activation, and Parameter memories, each contributing distinct functionalities to building a cohesive AI system.

Memory Challenges in LLMs

Modern LLMs, while powerful, are often limited by their static memory management. Traditional models primarily rely on vast parametric memory housed within weights. However, such systems face challenges in maintaining long-range context and adaptability. They lack mechanisms for managing evolving knowledge, personalizing interactions, and supporting cross-platform migrations seamlessly. To bridge these functional gaps, Retrieval-Augmented Generation (RAG) methods have introduced retrieval-based memory, yet these remain stateless and are limited in dynamic controllability.

Figure 1: MemOS achieves state-of-the-art performance across all reasoning tasks.

Architecture and Core Design of MemOS

MemOS is structured around three core memory types:

Plaintext Memory: External, editable knowledge blocks leveraging dynamic retrieval.
Activation Memory: Inference states such as KV-caches that embody runtime semantics.
Parameter Memory: Embedded knowledge within model weights, optimized by lightweight training such as LoRA.

The integration of these diverse memory types is mediated by a unified framework known as MemCube, which standardizes memory lifecycle management and representation across tasks and modalities.

Figure 2: Overview of the MemOS framework.

MemOS's architecture is hierarchically layered into:

Interface Layer: Translates user inputs into memory calls.
Operation Layer: Includes MemScheduler for planning memory paths and MemOperator for organizing memory structures.
Infrastructure Layer: Ensures memory persistence, security, and network-wide access through components like MemVault and MemGovernance.
Figure 3: Overview of MemOS architecture and memory interaction flow.

Innovative Capabilities

MemOS's revolutionary contributions include the introduction of controlled, modular, and evolvable memory systems for LLMs:

Memory as System Resource: Treats memory as a central architectural entity, akin to CPU or storage in traditional OSs, facilitating resource scheduling, lifecycle governance, and compliance.
Evolvability and Adaptation: Enables dynamic cross-memory type transformation, supporting lifelong learning and customization.
Governance and Security: Provides robust traceability, version control, and permission architectures to ensure secure, compliant operation in multi-agent environments.
Figure 4: MemCube: A unified encapsulation structure for heterogeneous memory scheduling.

Evaluations and Results

MemOS's efficacy is validated against the LOCOMO benchmark across task categories such as single-hop reasoning and temporal tasks. It achieves superior performance across metrics, including LLM-Judge scores and semantic embedding alignment, while maintaining low latency.

Figure 5: Performance trends of MemOS across memory configurations.

Conclusion

MemOS stands poised to transform LLM technology, advancing its use from static processing units into adaptive, memory-integrated intelligent agents. By addressing core architectural limitations and enhancing cross-task adaptability, MemOS marks a pivot towards more memory-centric AI systems. Future exploration includes cross-LLM memory applications, self-evolving memories, and decentralized memory markets, each offering profound implications for AI as a field.