Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

11 tokens/sec

GPT-4o

12 tokens/sec

Gemini 2.5 Pro Pro

40 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

37 tokens/sec

DeepSeek R1 via Azure Pro

33 tokens/sec

2000 character limit reached

MIRIX Architecture

Updated 15 July 2025

MIRIX architecture is a modular multi-agent memory system that organizes memory into six specialized types for efficient, context-aware retrieval.
It employs a hierarchical structure integrating episodic, semantic, procedural, and other memories to support multimodal inputs and dynamic updates.
Empirical benchmarks demonstrate up to 35% accuracy improvements and dramatic storage efficiency, validating its practical impact in real-world applications.

MIRIX is a modular, multi-agent memory system designed to advance memory capabilities for LLM-based agents. It introduces hierarchical, multimodal, and privacy-conscious components for persisting, updating, and retrieving diverse types of information. MIRIX transcends the scope of prior flat memory approaches by structuring memory into six functionally distinct types and orchestrating memory operations through a dynamic multi-agent framework. Its architecture supports both textual and multimodal inputs, enabling state-of-the-art performance in demanding benchmarks while remaining applicable to real-world assistant scenarios.

1. Modular Structure and Memory Organization

MIRIX organizes memory into six discrete and specialized modules, each responsible for different information types. This modularity departs from monolithic or unimodal designs and enables effective routing, parallel updates, and context-aware retrieval. At the core of the system is the Meta Memory Manager, whose responsibilities include the global routing and coordination of memory operations, while six dedicated Memory Managers oversee their respective memory modules. This approach supports:

Separate handling of user-related events, factual knowledge, procedural guides, resources, and sensitive data
Hierarchical structuring within modules (e.g., summarized and detailed fields)
Efficient parallel operation and prevention of redundancy
Dynamic retrieval adapted to the query context

The hierarchical structuring—such as summary/detail pairs in episodic memory—enables both abstraction for scalable storage and fine-grained recall on demand.

2. Specialized Memory Types

The six memory types in MIRIX collectively provide broad coverage of an agent’s information processing needs:

Memory Type	Information Handled	Structural Features
Core Memory	Persistent agent and user information	High-priority attributes (e.g., user name, agent persona), controlled rewrite
Episodic Memory	Time-stamped events and interactions	event_type, summary, details, actor, timestamp
Semantic Memory	Abstract, timeless factual knowledge	name/identifier, summary, detailed explanation, provenance
Procedural Memory	Task steps, guides, workflows	entry_type, goal description, instructions (often JSON-structured)
Resource Memory	Documents and multimodal files	title, summary, resource_type, segmented/full content
Knowledge Vault	Sensitive, verbatim private data	sensitivity level, secured access fields

Core Memory maintains persistent, high-priority data about both user and agent. It is designed to stay 'visible' throughout interactions, with controlled rewrites triggered above 90% capacity to ensure essential information is retained.

Episodic Memory acts as a temporal log of user-agent events including dialog, notifications, and contextual details, facilitating chronological recall and follow-up actions.

Semantic Memory contains general knowledge, abstracted from temporal context, supporting reasoning over facts not tied to particular events.

Procedural Memory encodes operational knowledge as structured guides and workflows, enabling the agent to recall and reuse step-by-step solutions.

Resource Memory manages external documents, transcripts, and multimodal resources, organizing them for efficient retrieval in ongoing tasks.

Knowledge Vault ensures secure storage of sensitive data (e.g., credentials), complete with dedicated fields for sensitivity and strict access control.

3. Multi-Agent Coordination and Control

MIRIX relies on a multi-agent architecture for both memory management and retrieval:

Each memory type is managed by a dedicated agent (Memory Manager).
A Meta Memory Manager coordinates updates and enforces synchronization.
New inputs (such as screenshots or user queries) are first checked against existing memory. The Meta Memory Manager then selects which agents must be updated, enabling simultaneous and non-conflicting memory updates.

For interactive dialogue, a Chat Agent utilizes an Active Retrieval process. It derives a “topic” from user queries, retrieves correspondingly relevant elements from all memory components, and uses a combination of retrieval methods—embedding_match, bm25_match, string_match—selected based on query context. Results are tagged according to their memory source (e.g., <episodic_memory>…</episodic_memory>) before injection into the prompt for response generation. This approach enables the agent to leverage grounded, contextually relevant information in multi-turn, multi-modal settings.

4. Benchmark Performance and Empirical Results

MIRIX was empirically validated on two representative benchmarks:

ScreenshotVQA:

Involves nearly 20,000 high-resolution screenshots per sequence and evaluates deep contextual understanding in a multimodal setting.
MIRIX achieved a 35% accuracy improvement over Retrieval-Augmented Generation (RAG) baselines while reducing storage requirements by 99.9%. Unlike baselines which store gigabytes of images, MIRIX utilizes a compact sqlite database (~15–20 MB) due to information abstraction and redundancy elimination.

LOCOMO:

A benchmark focused on long-form, multi-turn textual conversation.
MIRIX attained approximately 85.4% overall accuracy, outperforming existing baselines notably in multi-hop reasoning (by 24 points in some categories) and excelling in single-hop and temporal tasks through memory consolidation and abstraction strategies.

These results demonstrate that MIRIX’s architecture provides not only accuracy improvements but also dramatic storage efficiency compared to conventional or RAG-based systems.

5. Application Deployment and Privacy

MIRIX underpins a deployed personal assistant application built with a React-Electron frontend and Uvicorn backend. The deployment offers the following functionalities:

Continuous real-time screen monitoring, capturing screenshots every 1.5 seconds and employing a similarity threshold (0.99) to discard redundant images.
Information extraction and memory base construction are triggered every 60 seconds from unique, non-redundant screenshots.
Interaction via a chat interface, where users can query past activities or knowledge and inspect memory visually (e.g., Semantic Memory trees).
Privacy is maintained by storing only processed information in a secure local sqlite database, while raw images are discarded after extraction, minimizing both risk and storage demand.

This ensures an application that is both highly personalizable and privacy-conscious, meeting the dual challenges of user-specific adaptation and secure local data management.

6. Technical Implementation and Workflow

Key implementation details for MIRIX include:

Memory updates are performed in parallel via dedicated agents, using structured fields per memory type (e.g., event_type, summary, details, actor, timestamp for Episodic Memory).
Redundant visual inputs are eliminated by comparing screenshot similarity and discarding new images above a 0.99 similarity score, optimizing storage.
Images are streamed to the backend immediately (via the Gemini API), reducing upload-processing latency from approximately 50 seconds (when using models such as GPT-4) to under 5 seconds.
Multiple retrieval functions (embedding_match, bm25_match, string_match) are employed, with the retrieval strategy determined dynamically by the nature of the query.
Figures and tables in the original research are formatted using LaTeX for clarity (e.g., Figure~\ref{fig:overall_structure} depicts the system’s memory components).

The technical stack and workflow collectively support rapid multimodal data ingestion, robust structured storage, and prompt-aware retrieval in practical deployed settings.

7. Significance and Implications

MIRIX represents a significant advancement in memory-augmented LLM architectures by resolving the limitations of flat, non-hierarchical, or single-modal memory systems. Its multi-agent, modular structure better accommodates the real-world need for contextual, abstract, procedural, and sensitive knowledge integration within language agent frameworks. Empirical gains in accuracy and storage efficiency, especially on previously intractable multimodal benchmarks, underline its effectiveness. The real-world deployment demonstrates the feasibility of practical, privacy-preserving, and user-adaptive memory for AI agents at scale.

PDF Markdown Chat (Upgrade)