MemIntelli: Memory Intelligence & IMC Framework
- MemIntelli is a unified framework that combines memristive in-memory computing co-simulation with self-evolving memory intelligence for adaptive AI systems.
- It employs a three-part Manager–Planner–Executor architecture and hybrid memory representations (graph, attributes, embeddings) to optimize retrieval and continual learning.
- Empirical evaluations show significant performance gains in multimodal accuracy and system efficiency, while its hardware/software co-design ensures seamless ML integration.
MemIntelli refers to both: (1) a hardware/software co-simulation framework for memristive in-memory computing (IMC) systems, and (2) a set of architectural and algorithmic principles that underpin state-of-the-art “memory intelligence” in large agentic and multimodal AI systems. Across recent literature, MemIntelli encapsulates the drive towards unified, adaptive, and structured memory systems that integrate flexible precision, cross-layer modeling, autonomous augmentation, and continual self-evolution. This article surveys the major lines of research that concretize the MemIntelli concept, with focus on agent architectures, algorithmic pipelines, device-circuit–software stacks, and the empirical impact of such designs.
1. Architectural Frameworks: Manager–Planner–Executor and Beyond
MemIntelli, as recently instantiated in the Memory Intelligence Agent (MIA) framework, employs a three-part manager–planner–executor architecture for LLM-based agents (Qiao et al., 6 Apr 2026). The Memory Manager maintains a non-parametric store of compressed, structured memory units (“workflows”), supporting query-adaptive retrieval with hybrid semantic and history-aware scoring. The Planner, a parametric (learnable) LLM, produces search plans and incorporates an explicit reflection mechanism to revise those plans in response to execution feedback. The Executor (which may be an LLM or LMM) operationalizes the plan via a ReAct loop (alternating thought, tool call, and observation).
A critical loop is established: the Manager retrieves few-shot cases to prompt the Planner, the Executor carries out planned actions (possibly in a multimodal or tool-augmented setting), and the outcome (trajectory, answer, plan quality) is compressed and (if suitable) re-inserted into the memory buffer. This design enables continual adaptation and streaming, non-parametric learning while maintaining stable parametric updates via alternating RL.
2. Memory Representation: Compression, Retrieval, and Bi-Directionality
Each memory unit in MemIntelli encodes richly structured metadata: question and image caption embeddings, usage and success counts, and correctness labels (Qiao et al., 6 Apr 2026). Memory retrieval employs a weighted sum of min–max normalized semantic similarity, historical success ratio, and recency-based frequency terms.
Parametric and non-parametric memories form a bidirectional loop. The manager selects memories to inform the planner’s context; as the planner and executor generate new search trajectories, these are summarized by the manager and potentially replace or augment existing units. The retrieval mechanism employs cosine similarity between query and memory embeddings, with dynamic weighting to optimize relevance, quality, and diversity.
A similar bi-directional paradigm is observed in frameworks such as U-Mem, where episodic and procedural memories are balanced using Gaussian utility posteriors and semantic-aware Thompson sampling (Wu et al., 25 Feb 2026). This streaming consolidation approach, supporting append/merge/prune operations, keeps memory both dense and adaptive to evolving agent needs.
3. Self-Evolution: Test-Time Learning, Reflection, and Judgment
A distinctive feature of MemIntelli is support for continual test-time learning (TTL). During deployment, the planner continues to update its parameters, guided by feedback from either gold-standard correctness, internal LLM critiques (“judgers”), or unsupervised reviewer committees simulating peer review for logical, factual, and validity checks (Qiao et al., 6 Apr 2026). When unsuccessful reasoning paths are encountered, a reflection step is triggered, generating new candidate plans or sub-plans for the Executor to attempt.
Memory insertion is contrastive: “positive paradigms” (successful, minimal trajectories) and “negative paradigms” (representative failures) are both compressed and stored, enabling the agent to not only maintain best practices, but also to avoid common pitfalls.
This continual updating process occurs alongside inference, yielding seamless self-evolution under both supervised and unsupervised regimes. Empirical results demonstrate that unsupervised TTL regimes nearly match the performance of supervised adaptation, with steady accuracy gains across repeated epochs (Qiao et al., 6 Apr 2026).
4. Structured Memory: Graphs, Attributes, and Retrieval Strategies
Hybrid memory structures in MemIntelli-inspired systems tightly couple symbolic, attribute-labeled nodes with dense, continuous embeddings to support both fast look-up and generalization. For instance, GUI agents utilizing Hybrid Self-evolving Structured Memory (HyMEM) represent workflows as graph nodes combining high-level textual strategies, mid-level discrete attributes, and low-level trajectory encodings (Zhu et al., 11 Mar 2026). Edges connect semantically or attributively related workflows, allowing multi-hop expansion and covering both visually and conceptually diverse regions of experience space.
Retrieval is staged: first, a fast nearest-neighbor search extracts seed workflows via cosine similarity in embedding space; then, graph expansion (hopping along shared attributes) diversifies the memory context, counteracting the tight locality constraints of purely metric-based retrieval.
Structured annotation is also critical in textual systems. MemInsight systematically mines turn- or session-level attributes using strong LLMs (Claude-3-Sonnet, Llama 3, Mistral), sorting attributes by salience in “Priority Augmentation” and enabling both precise attribute-based filtering and robust embedding-based retrieval (Salama et al., 27 Mar 2025). Experiments show that priority embedding retrieval yields +35% overall recall over baseline dense passage retrieval and up to +14% gains in user-rated persuasiveness for recommendations.
5. Device–Algorithm Co-Design: Memristor IMC Simulation
The MemIntelli hardware/software stack (Zhou et al., 21 Nov 2025) advances co-simulation for memristive IMC, unifying device-level physics, array-level circuits, variable-precision architecture, and application-layer APIs. MemIntelli models each memristor’s conductance as a log-normal random variable, capturing process-level variation and dynamic noise. Kirchhoff-law–constrained circuit solves translate device-level physics into blockwise analog computation, incorporating wire resistance, parasitic capacitance, DAC/ADC quantization, and crossbar nonidealities.
Architecturally, matrices are block-quantized and sliced for (integer/floating) precision, with runtime selection of bit-slices implementing dynamic precision control—enabling both INT and FP operations, shared-exponent FP alignment, and per-layer mixed-precision. These mechanisms are exposed to high-level applications via NumPy and custom PyTorch layers (e.g., LinearMem, ConvMem), supporting seamless integration with standard ML training/inference pipelines.
Benchmarking across solvers, wavelet transforms, clustering, and diverse DNNs confirms that MemIntelli achieves <1–3% accuracy drop (at ≥5 bits precision), with remarkable alignment to full-precision software, and operational throughput that scales efficiently on CPUs and GPUs (Zhou et al., 21 Nov 2025).
6. Empirical Impact and Limitations
MemIntelli-derived systems deliver quantifiable improvements across key benchmarks:
- Manager–Planner–Executor agents achieve +31 percentage points in multimodal task accuracy over no-memory baselines on Qwen2.5-VL-7B, outperforming larger models and closed-source systems (Qiao et al., 6 Apr 2026).
- Hybrid graph+embedding memories such as HyMEM deliver a +22.5% boost over text-only memory, outperforming GPT-4o and Gemini2.5-Pro-Vision on complex GUI tasks (Zhu et al., 11 Mar 2026).
- Memory retrieval systems with cost-aware knowledge-extraction cascades and semantic Thompson sampling surpass RL-based optimizers in multi-hop QA by up to +14.6 points (Wu et al., 25 Feb 2026).
- IMC simulation via MemIntelli enables detailed co-design of precision, block size, and error corrections to bound output drift and optimize compute–memory trade-offs (Zhou et al., 21 Nov 2025).
Limitations remain: current heuristics for self-evolving memory and working-memory refresh rely on LLM prompts rather than learned meta-policies. Device models in hardware MemIntelli omit long-term effects (e.g., retention, drift). Attribute mining and memory augmentation depend on LLM-generated signals, which are vulnerable to hallucination. Evaluation metrics may not fully capture retrieval diversity or long-term utility.
7. Extensions and Future Directions
Several research vectors emanate from the MemIntelli corpus:
- Integration of dynamic consolidation and decay in memory scoring, incorporating recency–frequency weighting and automated forgetting (Salama et al., 27 Mar 2025).
- Explicit fusion of symbolic knowledge graphs with dense embedding-based retrieval, supporting multi-modal agents (text, vision, tabular data) (Wu et al., 25 Feb 2026).
- Learned cascade controllers that dynamically choose verification policies and retrieval trade-offs, leveraging meta-RL or bandit algorithms (Wu et al., 25 Feb 2026).
- Expansion of simulation frameworks to cover full device physics (long-term drift, retention), richer layer types (RNN, attention), and automated design-space exploration (Zhou et al., 21 Nov 2025).
- Application of hybrid memory primitives to embodied learning, multi-agent coordination, and scalable lifelong learning, with further improvements in compression and summarization (Zhu et al., 11 Mar 2026).
This suggests that the “MemIntelli” paradigm, initially rooted in hardware/software co-design, is converging with advances in agentic memory organization, retrieval optimization, and continual adaptation—laying the foundation for unified, self-improving memory systems across the stack.