Papers
Topics
Authors
Recent
Search
2000 character limit reached

LatentGraphMem: Scalable Graph Memory

Updated 24 April 2026
  • LatentGraphMem is a hybrid memory architecture that leverages latent graph embeddings with symbolic subgraph retrieval to support interpretable and low-latency long-horizon reasoning.
  • It segments texts into overlapping chunks and converts edge relations into latent embeddings, ensuring stable streaming updates and predictable inference times.
  • Empirical evaluations show superior performance over traditional methods, while highlighting future prospects in enhancing extraction accuracy and adaptive subgraph budgets.

LatentGraphMem is a memory framework for LLMs designed to enable efficient, interpretable, and robust long-horizon reasoning across vast contexts in question-answering tasks. It combines an implicit graph-structured memory in latent space with an explicit subgraph retrieval interface, addressing the bottlenecks faced by both explicit graph memories (which provide interpretability but are brittle at scale) and latent memory schemes (which are efficient but lack transparency). LatentGraphMem builds and stores the knowledge graph as latent embeddings and retrieves only a fixed-budget, symbolic subgraph relevant to the input query. This architecture supports scalable streaming updates, interpretable memory for downstream reasoning, and parameter-efficient adaptation while maintaining predictable inference latency regardless of the input context length (Zhang et al., 6 Jan 2026).

1. Architectural Paradigm and Motivation

LatentGraphMem was motivated by persistent challenges in long-context question answering where evidence is sparse and distributed across extended texts. Prior memory approaches fall into two major paradigms: explicit graph-based memories (such as entity-relation stores) and latent vector memories (such as soft token or embedding tables). The former are interpretable and externally inspectable but degrade sharply on long documents due to structure induction and retrieval failures. The latter remain robust over lengthy contexts but forego interpretability and controllability.

LatentGraphMem reconciles these trade-offs by maintaining all knowledge as a graph in latent embedding space for efficient, stable storage, while exposing an explicit, symbolic subgraph—selected by a retrieval module—to the downstream LLM reasoner. This compact subgraph can be inspected by humans under a fixed evidence budget.

2. Latent Graph Memory Construction

Input documents are segmented into overlapping chunks x(1),,x(C)x^{(1)},…,x^{(C)} of at most LL tokens with overlap OO. A graph builder module BϕB_\phi extracts relational triples (h,r,t)(h, r, t) from each chunk, incrementally constructing a full explicit graph G(c)=(V,E)\mathcal{G}^{(c)}=(\mathcal{V}, \mathcal{E}) up to a global capacity MM. The explicit graph is merged, canonicalized, filtered for schema compliance, and capped in size.

Each retained edge ei=(hi,ri,ti)e_i=(h_i, r_i, t_i) is embedded into a dd-dimensional latent vector ui=Embedϕ(hi,ri,ti)Rdu_i = \mathrm{Embed}_\phi(h_i, r_i, t_i)\in\mathbb{R}^d. The latent memory LL0 is thus a matrix of all edge embeddings, supporting stable, streaming memory updates. Although pairwise edge scores and Laplacian-based regularization are possible, the current implementation uses only the budget cap for implicit regularization.

3. Task-Specific Subgraph Retrieval

At inference, a subgraph retriever LL1 encodes the query LL2 to a vector LL3. Each edge embedding LL4 is scored against the query vector using a bilinear form LL5 where LL6 is learned. The top LL7-scoring edges are selected under a fixed retrieval budget: BϕB_\phi3 During backpropagation, a softmax-based straight-through estimator enables gradient flow. The selected subgraph LL8 is serialized into a compact symbolic format (e.g., "Relevant Knowledge: [h|r|t] …"), serving as the only externalized content passed to the frozen LLM reasoner.

4. Training Regimen

LatentGraphMem is trained in three stages with the reasoner held frozen:

  • Stage I (Full-Graph Construction): The builder module extracts the explicit full graph, serializes it, and supervision is provided by cross-entropy loss between the LLM-generated answer and ground truth given the entire graph and query.
  • Stage II (Latent Subgraph Retrieval): With graph extraction weights frozen, the retriever is trained to select subgraphs of size LL9, minimizing the same loss but with only the retrieved subgraph provided.
  • Stage III (Joint Fine-Tuning): Alternates between optimizing full-graph builder steps and joint builder-retriever steps, balancing extraction quality and retrieval efficacy.

All training utilizes QA-style cross-entropy loss routed through the frozen LLM reasoner, with gradients backpropagated via the straight-through TopK operator.

5. Inference Pipeline and Computational Complexity

At inference, the builder parses the document and forms the latent edge memory. The retriever encodes the query, scores and selects the top OO0 edges, and serializes the explicit subgraph for input to the reasoner, which generates the answer. Because only OO1 selected edges are included in the prompt, inference time depends on OO2, not on the document length OO3. This yields stable, low-latency inference even for very long contexts.

Complexity per module:

  • Builder: OO4, streaming over document length.
  • Retriever: OO5 per query, with OO6 capped.
  • LLM Reasoner: Scales with OO7, typically OO8.

6. Empirical Evaluation

LatentGraphMem's effectiveness is demonstrated on a diverse suite of long-context QA benchmarks, with training on TriviaQA, QASPER, and QuALITY (about 20K instances), and evaluation on HotpotQA (1K), NarrativeQA (800), and WikiHop (800). Qwen2.5-1.5B, SmolLM3-3B, and Qwen3-8B serve as frozen LLM reasoners. Baselines include retrieval-augmented generation (RAG), explicit-graph models (THEANINE, PREMem, Mem0, A-Mem), and the latent memory MemGen.

Main results (average accuracy, three tasks):

Backbone Reasoner MemGen LatentGraphMem
1.5B parameters 52.1 44.0 56.1
3B parameters 50.7 49.6 58.6
8B parameters 54.7 54.6 63.3

LatentGraphMem outperforms both explicit-graph and latent-memory baselines at all scales, with the largest gains on multi-hop (HotpotQA) and wide-coverage (WikiHop) benchmarks. Ablation studies show removal of latent retrieval or heuristic BFS retrieves (explicit graph) each incur 3–7 point drops in accuracy. Varying graph capacity OO9 highlights dataset-dependent trade-offs, with performance saturating around BϕB_\phi0 edges.

Inference latency is nearly flat with respect to context length, with timings (1.5B backbone, context BϕB_\phi16k tokens) showing 12.5s for LatentGraphMem (vs. MemGen 10.6s, A-Mem 20.0s at 6k tokens; LatentGraphMem 13.8s, A-Mem 41.9s at 10k tokens).

7. Limitations and Prospects

LatentGraphMem's strengths include robust scaling to extremely long contexts, fixed-budget explicit evidence for interpretability, and parameter-efficient LoRA-based adaptation for various LLM reasoners. However, the system depends on the quality of graph extraction, with extraction errors impacting downstream QA performance. Budgets BϕB_\phi2 require per-task tuning. The model is presently text-only, lacking support for multi-modal or interactive settings.

Potential future directions include incorporation of node embeddings and adjacency prediction, regularization of the latent space (e.g., using Laplacian or contrastive losses), expansion to dialog, multi-agent, or multi-modal documents (including vision+text), and exploration of dynamic subgraph budgets conditioned on question complexity. A plausible implication is that further integration of graph-regularization and adaptivity could yield additional improvements in interpretability and performance (Zhang et al., 6 Jan 2026).

In summary, LatentGraphMem operationalizes the paradigm “store latent, retrieve explicit”, achieving an overview of efficient, stable memory management with controlled, interpretable reasoning evidence for long-horizon LLM applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LatentGraphMem.