Mnemis: Hybrid Memory System for LLMs

Updated 3 July 2026

Mnemis is a dual-route memory architecture that combines fast similarity (System 1) and semantic-hierarchy (System 2) retrieval for long-term, scalable memory recall.
It constructs parallel graph structures—a fine-grained base graph and a multi-layered hierarchical graph—to capture episodic interactions and enable both precise matching and global reasoning.
Empirical evaluations show that Mnemis outperforms traditional RAG systems with improvements of 15–20 points on benchmarks, highlighting its robust performance in multi-hop and temporal reasoning tasks.

Mnemis is a memory architecture for LLMs that fuses fast similarity-based (System 1) and deliberate semantic-hierarchy-based (System 2) retrieval to enable long-term, scalable, and semantically comprehensive recall of historical user interactions. By maintaining parallel graph structures—a fine-grained base graph and a multi-layered hierarchical graph—Mnemis achieves state-of-the-art performance on long-term memory benchmarks and demonstrates robust coverage in scenarios requiring both precise matching and global reasoning (Tang et al., 17 Feb 2026).

1. Motivation and Conceptual Foundations

The central motivation for Mnemis is the need for LLMs to operate as persistent assistants capable of “remembering” and leveraging months or years of episodic interaction history. Traditional approaches—such as naively expanding the context window—are constrained by quadratic attention costs, finite context lengths (current limits near 128K tokens), and the risk of introducing irrelevant information. Retrieval-Augmented Generation (RAG) and variants like Graph-RAG address these constraints by encoding conversational history as retrievable “episodic memory,” often structured as graphs of entities and relations.

These similarity-centric techniques approximate human “System 1” cognition (fast, associative retrieval) but struggle with queries requiring enumeration or global semantic reasoning (e.g., "Which cities did I visit in 2023?"). Insights from cognitive science identify the complementarity of “System 2” mechanisms—deliberate reasoning over semantic hierarchies. Mnemis operationalizes this synergy by explicitly combining System 1 matching on the base graph with a System 2 “Global Selection” over a purposely constructed hierarchical graph, thus enabling both coverage and precision in long-term information retrieval (Tang et al., 17 Feb 2026).

2. Architecture and Dual-Route Retrieval

Mnemis persists memory via two coordinated graph structures:

Base Graph: Encodes historical text as a directed, attributed graph consisting of episodes (raw text spans), entities (people, places, concepts), edges (binary relations or events), and episodic-edges (links associating entities with relevant episodes). All elements are embedded in a $d$ -dimensional space using encoder models (e.g., Qwen3-Embedding-0.6B, 128D). Similarity search uses cosine similarity and BM25 textual scoring, then fuses rankings via Reciprocal Rank Fusion (RRF). Type-specific retrieval budgets are used (10 episodes, 20 entities, 20 edges) (Tang et al., 17 Feb 2026).
Hierarchical Graph: Constructs multi-level categories grouping entities via LLM-guided minimum concept abstraction. Each layer $L_i$ (for $i\in[0,h]$ ) contains semantic categories or entities, subject to constraints enforcing minimal aggregation per category ( $n$ children) and progressive abstraction ( $|L_i|\leq |L_{i-1}|$ for $i\geq2$ ).

At retrieval time, System 1 rapidly selects the top- $k$ candidates via hybrid similarity; System 2 performs a top-down category traversal, leveraging the LLM to flag relevant semantic subtrees via iterative selection. Retrieved items from both mechanisms are unioned and re-ranked to form the final answer context.

3. Base and Hierarchical Graph Construction

Base graph construction involves:

Episode Segmentation: Raw history is chunked into episodes.
Entity and Edge Extraction: Entities and their relations are parsed from text; each entity is attributed with name, summary, and type. Edges are time-stamped.
Embedding and Similarity: Each node and edge are embedded in $\mathbb{R}^d$ . Cosine similarity is computed as

$\cos(e_q, e_x) = \frac{e_q \cdot e_x}{\|e_q\| \; \|e_x\|}.$

BM25 operates on the textual fields. The two rankings are merged per

$\mathrm{RRFScore}(x) = \sum_{i \in \{\cos, \mathrm{BM25}\}} \frac{1}{k + \mathrm{rank}_i(x)}.$

Hierarchical graph construction is performed by prompting the LLM to generate minimal semantic groupings. Each category must contain at least $L_i$ 0 entities; categories may overlap (many-to-many mapping). For higher layers, a compression constraint ensures $L_i$ 1 for $L_i$ 2. Hierarchy construction is run in batch, with future work aiming at incremental updating.

4. Query-Time Operation: Global Selection and Fusion

Query handling in Mnemis proceeds in parallel along both retrieval routes:

System 1: The query is embedded and used to select the top- $L_i$ 3 episodes, entities, and edges via the RRF scoring scheme.
System 2 (Global Selection): The LLM receives category names/tags at each hierarchy layer and is prompted to select categories likely to “cover” the user’s query. If “get all children” is flagged, all descendants are included. Traversal continues until entity level, at which point all episodes and edges incident on the relevant entities are aggregated.

Post-retrieval, all candidates are scored by a re-ranking model (Qwen3-Reranker-8B). For each type, the top- $L_i$ 4 entries from the union ( $L_i$ 5) are selected as context:

$L_i$ 6

This design allows precise coverage for global, enumeration, or weakly-specified queries while maintaining speed and utility for strictly local or associative retrieval demands.

5. Implementation and Empirical Evaluation

Embedding: Qwen3-Embedding-0.6B (128-dim)
Reranking: Qwen3-Reranker-8B
Answer Generation: GPT-4.1-mini
Graph Backend: Neo4j, storing both episodic and hierarchical relations
Ingestion: Entity and edge extraction plus de-duplication at commit time; hierarchy rebuilding is periodic.

Mnemis was evaluated on two established LLM memory benchmarks:

Benchmark	RAG Baseline	Mnemis Score
LoCoMo	73.8	93.9
LongMemEval-S	72.6	91.6

Performance uplift is especially pronounced on multi-hop and temporal reasoning slices, with gains of 15–20 points over baselines (Mem0, Zep, Nemori, EMem-G, EverMemOS). Ablation studies show System 1 alone achieves 89.1, System 2 alone 87.7, and the union 93.3–93.9, demonstrating the necessity of combining both retrieval modes. Sensitivity analysis indicates that, unlike System 1, System 2 maintains high coverage and answer quality even at low retrieval budgets ( $L_i$ 7) (Tang et al., 17 Feb 2026).

6. Limitations, Analysis, and Future Directions

Current limitations include the monolithic, offline nature of hierarchical graph rebuilding and the reliance on LLM interventions for System 2 traversal at each layer. The system only supports textual memory; extension to multimodal (image, table, code) sources remains open. Future work is planned in several directions: (1) Incremental or streaming hierarchy construction; (2) Cost-sensitive or learned traversal policies optimizing LLM call budgets; (3) Lightweight planning subsystems to map user queries to retrieval subgoals; (4) Adaptation to richer memory modalities such as video or sensor logs.

The empirical success of Mnemis demonstrates that fusing fine-grained similarity retrieval (System 1) with structured, global semantic traversal (System 2) produces compound advantages unobtainable by either method alone, establishing Mnemis as a state-of-the-art architecture for long-term LLM memory (Tang et al., 17 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mnemis.