Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
9 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MIRIX: Multi-Agent Memory System for LLM-Based Agents (2507.07957v1)

Published 10 Jul 2025 in cs.CL and cs.AI

Abstract: Although memory capabilities of AI agents are gaining increasing attention, existing solutions remain fundamentally limited. Most rely on flat, narrowly scoped memory components, constraining their ability to personalize, abstract, and reliably recall user-specific information over time. To this end, we introduce MIRIX, a modular, multi-agent memory system that redefines the future of AI memory by solving the field's most critical challenge: enabling LLMs to truly remember. Unlike prior approaches, MIRIX transcends text to embrace rich visual and multimodal experiences, making memory genuinely useful in real-world scenarios. MIRIX consists of six distinct, carefully structured memory types: Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault, coupled with a multi-agent framework that dynamically controls and coordinates updates and retrieval. This design enables agents to persist, reason over, and accurately retrieve diverse, long-term user data at scale. We validate MIRIX in two demanding settings. First, on ScreenshotVQA, a challenging multimodal benchmark comprising nearly 20,000 high-resolution computer screenshots per sequence, requiring deep contextual understanding and where no existing memory systems can be applied, MIRIX achieves 35% higher accuracy than the RAG baseline while reducing storage requirements by 99.9%. Second, on LOCOMO, a long-form conversation benchmark with single-modal textual input, MIRIX attains state-of-the-art performance of 85.4%, far surpassing existing baselines. These results show that MIRIX sets a new performance standard for memory-augmented LLM agents. To allow users to experience our memory system, we provide a packaged application powered by MIRIX. It monitors the screen in real time, builds a personalized memory base, and offers intuitive visualization and secure local storage to ensure privacy.

Summary

  • The paper demonstrates a modular multi-agent memory system that customizes and persists user data across time and modalities to enhance LLM agent performance.
  • It introduces an active retrieval mechanism that autonomously selects optimal methods, ensuring high accuracy and efficiency for context-aware responses.
  • The experimental evaluation shows significant gains, with up to 35% higher accuracy over RAG and an 8% improvement on long-form conversation benchmarks.

MIRIX: A Modular Multi-Agent Memory System for LLM-Based Agents

The MIRIX framework introduces a comprehensive, modular memory architecture for LLM-based agents, addressing persistent limitations in current memory-augmented systems. The design is motivated by the need for agents to persist, abstract, and reliably recall user-specific information across time and modalities, a capability that is essential for consistent personalization, long-term reasoning, and real-world usability.

System Architecture

MIRIX is structured around six specialized memory components, each managed by a dedicated agent and coordinated by a central Meta Memory Manager:

  • Core Memory: Stores persistent, high-priority information about the user and agent persona.
  • Episodic Memory: Captures time-stamped, event-based user experiences.
  • Semantic Memory: Maintains abstract, factual, and relational knowledge.
  • Procedural Memory: Encodes step-by-step instructions and workflows.
  • Resource Memory: Handles documents, files, and multimodal resources.
  • Knowledge Vault: Secures verbatim, sensitive information (e.g., credentials, contacts).

This compositional approach enables fine-grained routing and retrieval, supporting both efficient storage and accurate, context-aware recall. Each memory type is internally structured (e.g., episodic entries include event type, summary, details, actor, timestamp), facilitating targeted updates and retrievals.

A multi-agent workflow underpins the system: the Meta Memory Manager orchestrates input processing, memory updates, and retrievals, delegating to specialized Memory Managers. This modularity supports parallelism, scalability, and extensibility, and is particularly well-suited for heterogeneous, multimodal user interactions.

Active Retrieval Mechanism

A key innovation is the Active Retrieval mechanism. Rather than relying on explicit user prompts to trigger memory access, the agent autonomously generates a topic from the input context, retrieves relevant entries from each memory component, and injects them into the system prompt. This ensures that responses are grounded in up-to-date, personalized, and contextually relevant information, mitigating the risk of outdated or incorrect parametric knowledge.

Multiple retrieval strategies are supported (embedding match, BM25, string match), and the agent dynamically selects the most appropriate method based on context. This flexibility enhances retrieval accuracy and efficiency across diverse query types.

Application and Use Cases

MIRIX is implemented as a cross-platform application (React-Electron frontend, Uvicorn backend), with real-time screen monitoring, memory visualization, and a chat interface. The system captures screenshots at 1.5-second intervals, deduplicates similar images, and streams unique screenshots for processing. Visual data is processed via the Gemini API, enabling low-latency, asynchronous uploads and retrievals.

The architecture is also positioned for integration with wearable devices, supporting hybrid on-device/cloud memory management. Critical information can be stored locally for privacy, while large-scale resources are offloaded to the cloud. This design aligns with the constraints and requirements of edge AI applications.

The authors further propose an Agent Memory Marketplace, envisioning personal memory as a digital asset class. The marketplace would enable privacy-preserving sharing, aggregation, and monetization of structured memories, with robust encryption and fine-grained access controls.

Experimental Evaluation

ScreenshotVQA Benchmark

MIRIX is evaluated on ScreenshotVQA, a challenging multimodal benchmark involving up to 20,000 high-resolution screenshots per user sequence. Existing memory systems are inapplicable due to scale and modality constraints. MIRIX achieves:

  • 35% higher accuracy than the RAG baseline
  • 99.9% reduction in storage requirements compared to RAG
  • 410% improvement in accuracy and 93.3% reduction in storage relative to a long-context Gemini baseline

These results are achieved by extracting and storing only salient information in a compact sqlite database, rather than retaining raw images.

LOCOMO Benchmark

On the LOCOMO long-form conversation benchmark, MIRIX attains:

  • State-of-the-art overall accuracy of 85.4%
  • 8.0% improvement over the strongest open-source competitor (LangMem)
  • Performance approaching the upper bound set by full-context models

MIRIX demonstrates particularly strong gains on multi-hop and temporal reasoning tasks, attributed to its hierarchical memory storage and explicit consolidation of dispersed information. The system's modular routing and retrieval are especially effective for long-range, multi-hop queries.

Implications and Future Directions

MIRIX establishes a new standard for memory-augmented LLM agents, both in terms of performance and system design. The modular, multi-agent architecture enables scalable, efficient, and contextually rich memory management, supporting real-world applications that require persistent, multimodal, and personalized memory.

The strong empirical results—especially the order-of-magnitude improvements in storage efficiency and accuracy—underscore the practical viability of structured, compositional memory systems over flat or text-centric approaches. The integration of active retrieval and multi-agent coordination further enhances the system's robustness and adaptability.

The proposed Agent Memory Marketplace introduces a novel paradigm for memory as a digital asset, with significant implications for privacy, personalization, and economic value in AI ecosystems.

Future research directions include:

  • Extending MIRIX to more challenging, real-world benchmarks and diverse modalities
  • Enhancing retrieval strategies with more sophisticated, context-aware methods
  • Exploring lifelong learning, memory consolidation, and forgetting mechanisms
  • Investigating the societal and ethical implications of memory sharing and monetization

MIRIX provides a robust foundation for the development of next-generation LLM agents with human-like memory capabilities, supporting persistent, adaptive, and contextually intelligent behavior in complex environments.

Youtube Logo Streamline Icon: https://streamlinehq.com