Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
11 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

MIRIX Architecture

Updated 15 July 2025
  • MIRIX architecture is a modular multi-agent memory system that organizes memory into six specialized types for efficient, context-aware retrieval.
  • It employs a hierarchical structure integrating episodic, semantic, procedural, and other memories to support multimodal inputs and dynamic updates.
  • Empirical benchmarks demonstrate up to 35% accuracy improvements and dramatic storage efficiency, validating its practical impact in real-world applications.

MIRIX is a modular, multi-agent memory system designed to advance memory capabilities for LLM-based agents. It introduces hierarchical, multimodal, and privacy-conscious components for persisting, updating, and retrieving diverse types of information. MIRIX transcends the scope of prior flat memory approaches by structuring memory into six functionally distinct types and orchestrating memory operations through a dynamic multi-agent framework. Its architecture supports both textual and multimodal inputs, enabling state-of-the-art performance in demanding benchmarks while remaining applicable to real-world assistant scenarios.

1. Modular Structure and Memory Organization

MIRIX organizes memory into six discrete and specialized modules, each responsible for different information types. This modularity departs from monolithic or unimodal designs and enables effective routing, parallel updates, and context-aware retrieval. At the core of the system is the Meta Memory Manager, whose responsibilities include the global routing and coordination of memory operations, while six dedicated Memory Managers oversee their respective memory modules. This approach supports:

  • Separate handling of user-related events, factual knowledge, procedural guides, resources, and sensitive data
  • Hierarchical structuring within modules (e.g., summarized and detailed fields)
  • Efficient parallel operation and prevention of redundancy
  • Dynamic retrieval adapted to the query context

The hierarchical structuring—such as summary/detail pairs in episodic memory—enables both abstraction for scalable storage and fine-grained recall on demand.

2. Specialized Memory Types

The six memory types in MIRIX collectively provide broad coverage of an agent’s information processing needs:

Memory Type Information Handled Structural Features
Core Memory Persistent agent and user information High-priority attributes (e.g., user name, agent persona), controlled rewrite
Episodic Memory Time-stamped events and interactions event_type, summary, details, actor, timestamp
Semantic Memory Abstract, timeless factual knowledge name/identifier, summary, detailed explanation, provenance
Procedural Memory Task steps, guides, workflows entry_type, goal description, instructions (often JSON-structured)
Resource Memory Documents and multimodal files title, summary, resource_type, segmented/full content
Knowledge Vault Sensitive, verbatim private data sensitivity level, secured access fields

Core Memory maintains persistent, high-priority data about both user and agent. It is designed to stay 'visible' throughout interactions, with controlled rewrites triggered above 90% capacity to ensure essential information is retained.

Episodic Memory acts as a temporal log of user-agent events including dialog, notifications, and contextual details, facilitating chronological recall and follow-up actions.

Semantic Memory contains general knowledge, abstracted from temporal context, supporting reasoning over facts not tied to particular events.

Procedural Memory encodes operational knowledge as structured guides and workflows, enabling the agent to recall and reuse step-by-step solutions.

Resource Memory manages external documents, transcripts, and multimodal resources, organizing them for efficient retrieval in ongoing tasks.

Knowledge Vault ensures secure storage of sensitive data (e.g., credentials), complete with dedicated fields for sensitivity and strict access control.

3. Multi-Agent Coordination and Control

MIRIX relies on a multi-agent architecture for both memory management and retrieval:

  • Each memory type is managed by a dedicated agent (Memory Manager).
  • A Meta Memory Manager coordinates updates and enforces synchronization.
  • New inputs (such as screenshots or user queries) are first checked against existing memory. The Meta Memory Manager then selects which agents must be updated, enabling simultaneous and non-conflicting memory updates.

For interactive dialogue, a Chat Agent utilizes an Active Retrieval process. It derives a “topic” from user queries, retrieves correspondingly relevant elements from all memory components, and uses a combination of retrieval methods—embedding_match, bm25_match, string_match—selected based on query context. Results are tagged according to their memory source (e.g., <episodic_memory>…</episodic_memory>) before injection into the prompt for response generation. This approach enables the agent to leverage grounded, contextually relevant information in multi-turn, multi-modal settings.

4. Benchmark Performance and Empirical Results

MIRIX was empirically validated on two representative benchmarks:

ScreenshotVQA:

  • Involves nearly 20,000 high-resolution screenshots per sequence and evaluates deep contextual understanding in a multimodal setting.
  • MIRIX achieved a 35% accuracy improvement over Retrieval-Augmented Generation (RAG) baselines while reducing storage requirements by 99.9%. Unlike baselines which store gigabytes of images, MIRIX utilizes a compact sqlite database (~15–20 MB) due to information abstraction and redundancy elimination.

LOCOMO:

  • A benchmark focused on long-form, multi-turn textual conversation.
  • MIRIX attained approximately 85.4% overall accuracy, outperforming existing baselines notably in multi-hop reasoning (by 24 points in some categories) and excelling in single-hop and temporal tasks through memory consolidation and abstraction strategies.

These results demonstrate that MIRIX’s architecture provides not only accuracy improvements but also dramatic storage efficiency compared to conventional or RAG-based systems.

5. Application Deployment and Privacy

MIRIX underpins a deployed personal assistant application built with a React-Electron frontend and Uvicorn backend. The deployment offers the following functionalities:

  • Continuous real-time screen monitoring, capturing screenshots every 1.5 seconds and employing a similarity threshold (0.99) to discard redundant images.
  • Information extraction and memory base construction are triggered every 60 seconds from unique, non-redundant screenshots.
  • Interaction via a chat interface, where users can query past activities or knowledge and inspect memory visually (e.g., Semantic Memory trees).
  • Privacy is maintained by storing only processed information in a secure local sqlite database, while raw images are discarded after extraction, minimizing both risk and storage demand.

This ensures an application that is both highly personalizable and privacy-conscious, meeting the dual challenges of user-specific adaptation and secure local data management.

6. Technical Implementation and Workflow

Key implementation details for MIRIX include:

  • Memory updates are performed in parallel via dedicated agents, using structured fields per memory type (e.g., event_type, summary, details, actor, timestamp for Episodic Memory).
  • Redundant visual inputs are eliminated by comparing screenshot similarity and discarding new images above a 0.99 similarity score, optimizing storage.
  • Images are streamed to the backend immediately (via the Gemini API), reducing upload-processing latency from approximately 50 seconds (when using models such as GPT-4) to under 5 seconds.
  • Multiple retrieval functions (embedding_match, bm25_match, string_match) are employed, with the retrieval strategy determined dynamically by the nature of the query.
  • Figures and tables in the original research are formatted using LaTeX for clarity (e.g., Figure~\ref{fig:overall_structure} depicts the system’s memory components).

The technical stack and workflow collectively support rapid multimodal data ingestion, robust structured storage, and prompt-aware retrieval in practical deployed settings.

7. Significance and Implications

MIRIX represents a significant advancement in memory-augmented LLM architectures by resolving the limitations of flat, non-hierarchical, or single-modal memory systems. Its multi-agent, modular structure better accommodates the real-world need for contextual, abstract, procedural, and sensitive knowledge integration within language agent frameworks. Empirical gains in accuracy and storage efficiency, especially on previously intractable multimodal benchmarks, underline its effectiveness. The real-world deployment demonstrates the feasibility of practical, privacy-preserving, and user-adaptive memory for AI agents at scale.