- The paper introduces Memanto as a universal memory layer that replaces hybrid graph architectures with a deterministic, typed semantic memory system to overcome the ‘Memory Tax’.
- It employs maximally informative binarization to achieve 32× compression and sub-90ms retrieval latency, yielding state-of-the-art accuracy (89.8% on LongMemEval and 87.1% on LoCoMo).
- The study shows that expanding retrieval recall is more impactful than complex graph-based methods, enabling scalable, low-overhead memory for long-horizon agent applications.
Motivation and Context
As LLM-based agents transition to persistent multi-session operation across extended reasoning trajectories, memory management has surfaced as an acute architectural bottleneck in agentic AI. The complexity of contemporary memory infrastructures, primarily dominated by hybrid graph-vector architectures, imposes substantial compute, latency, and operational overhead—what the authors term the “Memory Tax.” These hybrid systems persistently require entity extraction, schema maintenance, and multi-query retrieval strategies, all coordinated via LLM-mediated ingestion. The paper positions Memanto as a universal memory layer, proposing that highly optimized semantic retrieval, structured typing, and conflict resolution can supersede the need for graph complexity in long-horizon agent memory.
Architectural Innovations
Memanto is architected atop Moorcheh's Information Theoretic Search (ITS) engine, diverging sharply from approximate nearest neighbor paradigms (e.g., HNSW [malkov2018hnsw]) by deploying maximally informative binarization (MIB) and information-theoretic scoring. MIB achieves a 32× compression rate while maintaining retrieval-relevant signal integrity, thus driving sub-90 ms deterministic retrieval latency with zero ingestion overhead. The deterministic ITS metric evaluates chunk relevance in terms of uncertainty reduction, enforcing reproducible context delivery critical for agent stability.
Memanto’s memory schema encompasses thirteen semantically typed categories—fact, preference, decision, commitment, goal, event, instruction, relationship, context, learning, observation, error, artifact—enabling granular retrieval filtering and priority signals. Each memory commit is atomically tagged, temporal versioned, and routed through an integrated conflict resolution module. Contradictory commits are flagged and require explicit agent-side resolution (supersede, retain, annotate), preventing constraint drift and ensuring world model coherence.
Temporal versioning supports as-of queries, changed-since retrievals, and current-only filtering. Namespaces partition memory per agent, further organizing session boundaries without restricting persistent cross-session recall. Daily intelligence artifacts synthesize summaries, contradiction reports, and audit trails for production readiness.
Empirical Evaluation and Results
Systematic benchmarking utilized LongMemEval (Wu et al., 2024) and LoCoMo (Maharana et al., 2024), both recognized standards in agentic memory evaluation. A progressive five-stage ablation, isolating the contribution of recall expansion, prompt design, threshold calibration, and inference model selection, demonstrated that retrieval recall—and not architectural complexity—is the dominant lever for accuracy. Memanto achieved state-of-the-art accuracy on LongMemEval (89.8%) and LoCoMo (87.1%), outperforming all vector-only baselines and equaling or surpassing hybrid competitors at a fraction of their complexity and cost.
The ablation highlighted that expanding retrieval limit (k) from 10 to 100 yielded 28.4 percentage points improvement on LongMemEval, with prompt optimization contributing only marginal gains (<2.2 percentage points). This recall-over-precision principle directly counters the prevailing paradigm of precise multi-hop graph traversal and entity resolution.
Category-specific analysis showed peak performance in single-session assistant/user preference queries (>95%), with lowest accuracy in multi-session scenarios (81.2%), reflecting distributed chunk fragmentation and the inherent challenge of synthesizing long-horizon factual recall.
Comparative results (Table~\ref{tab:comparison}) positioned Memanto as the highest-performing vector-only architecture, requiring only a single retrieval query and exhibiting zero ingestion latency. Hybrid systems such as Hindsight (Latimer et al., 14 Dec 2025) surpassed Memanto by <2 percentage points but only through maximum complexity (multi-query plus reflection) and incurred substantial operational overhead.
Operational analysis quantified ingestion cost, retrieval latency, infrastructure complexity, and idle compute. Memanto eliminates LLM calls at ingestion and scales to zero during idle periods, contrasting with graph-based architectures (Mem0g, Zep) that incur multi-second ingestion latency and fixed compute overhead, yielding significant annual savings per agent (>\$600).
Theoretical Implications
Memanto empirically refutes the necessity of knowledge graph complexity for high-fidelity memory in agentic systems. The evidence indicates that modern LLMs, when provided with broad, semantically relevant context, can perform in-context reasoning and filtering superior to pre-computed graph structures. The ITS engine’s deterministic retrieval is critical for agent stability, eliminating the non-determinism typical of ANN-based search. This architecture achieves maximum performance in the ideal complexity-accuracy quadrant (Fig.~\ref{fig:scatter}), substantiating that retrieval tuning and schema typing are more consequential than graph expressiveness.
Conflict resolution is highlighted as a production necessity, directly addressing the constraint drift phenomenon and operationalizing MemoryAgentBench (Hu et al., 7 Jul 2025)'s findings on multi-hop contradiction. Temporal versioning and provenance tagging ensure compliance and auditability.
Practical Implications and Future Directions
Memanto positions itself for scalable, production-ready deployment in agentic systems. Its operational simplicity, deterministic retrieval, and elimination of memory tax enable rapid iteration and debugging for real-time workflows. The design principles—structured typing, recall tuning, minimal ingestion overhead, explicit conflict handling, temporal awareness, and provenance tracking—are transferable to architectures seeking scalable long-horizon agent memory.
Benchmark saturation and label quality now define the upper bounds of current evaluations; as accuracy approaches the practical ceilings, new evaluation frameworks targeting conflict resolution, multi-agent memory sharing, and non-conversational workflows are needed. Inference model upgrades continue to contribute marginal improvements, but retrieval architecture remains the primary determinant of empirical performance. Memanto’s namespace-based isolation is foundational for multi-agent coordination, but extension to shared memory protocols is underway.
Conclusion
Memanto operationalizes a principled exchange: architectural complexity is traded for operational determinism, zero-latency ingestion, and structured semantic typing. The empirical data validate this exchange; Memanto achieves state-of-the-art accuracy across agentic memory benchmarks via a vector-only approach, eliminating the memory tax imposed by hybrid graph architectures. The recall-over-precision principle, deterministic retrieval, and daily operational artifacts define a robust template for scalable agentic AI memory.