Memory Bank: Claim–Evidence Graph

Updated 27 January 2026

Memory Bank (Claim–Evidence Graph) is a structured, persistent graph that encodes claims and evidence with explicit, weighted relationships for scalable, explainable reasoning.
It employs advanced pipelines including NER, LLM-based claim extraction, evidence retrieval, and dynamic graph assembly to ensure fine-grained traceability and auditability.
These systems underpin fact verification, agent memory, and retrieval-augmented generation by aligning evidence and claims for robust AI decision-making.

A Memory Bank as a Claim–Evidence Graph is a formally structured, persistent, and queryable data architecture that encodes the relationships between discrete claims or propositions and the explicit evidence supporting or refuting them. This paradigm underlies modern explainable verification, agentic memory, and iterative evidence aggregation pipelines across fact-checking, knowledge-driven reasoning, and retrieval-augmented generation (RAG) settings. Adopting precise bipartite (or heterogeneous) graph abstractions, the memory bank facilitates fine-grained traceability, dynamic evidence audit, and scalable multi-hop reasoning.

1. Formal Definitions and Graph Structure

The core abstraction is a directed graph—typically bipartite, sometimes augmented to heterogeneous or multiplex graphs—where nodes represent atomic claims and explicit evidence units, and edges encode supporting, refuting, or ambiguous relationships, optionally weighted by attribution, confidence, or semantic alignment.

Generalized Claim–Evidence Graph

Let $C=\{c_1,\dots,c_n\}$ denote the set of claim nodes (e.g., atomic statements, extracted sub-claims, or belief statements), and $E=\{e_1,\dots,e_m\}$ the set of evidence nodes (e.g., KG triplets, document spans, knowledge snippets):

In ClaimVer (Dammu et al., 2024), $G=(V, E)$ where $V=C \cup E$ and edges link each $c_i$ to all $e_j$ in its retrieved evidence subset, i.e., $E=\{(c_i, e_j) \mid e_j\in\mathrm{ret}(c_i)\}$ .
In ADORE (You et al., 26 Jan 2026), a three-layer structure $V = S \cup C \cup E$ is used, with $S$ representing report sections, $C$ claims, and $E$ admissible evidence excerpts. Edges are strictly between sections and claims, and between evidence and claims, ensuring modular admissibility checks per section.

Graph edges can be weighted (e.g., ClaimVer’s $\alpha(c_i, e_j)$ ) or carry labels denoting the nature of support (Attribute, Extrapolatory, Contradictory). Advanced systems further enrich nodes ( $V=F\cup E_x\cup S\cup B$ in Hindsight (Latimer et al., 14 Dec 2025)) and edge types ( $E_{time}, E_{sem}, E_{ent}, E_{cau}, E_{ev}$ ) to support temporal, semantic, entity, causal, and evidence-belief linking.

2. Memory Bank Construction Pipelines

Memory Bank instantiation follows modular multi-stage pipelines, encompassing preprocessing, graph assembly, and dynamic updates.

Canonical Pipeline (ClaimVer)

Preprocessing: Apply NER/entity linking and coreference resolution over text $T$ .
Claim Extraction: Use an LLM or purpose-trained model to map $T$ to $C$ via $f_C(T)$ , yielding spans $(s_i, e_i)$ .
Evidence Retrieval: For each $c_i$ , execute graph search or IR/Bi-encoder retrieval $f_E(c_i, \mathcal{G})$ for candidate $e_j$ (triplets, passages, or documents).
Verification and Attribution: LLM-based verification assigns labels $y_i$ (Attributable, Extrapolatory, Contradictory), identifies relevant triplets, and generates rationales.
Score Assignment: Compute claim scores $cs(y_i)$ and match scores $\mathrm{TMS}_i$ ; aggregate to global $S$ and a one-sided sigmoid-based KG Attribution Score (KAS).
Graph Assembly: Instantiate $G$ with edges, optionally annotate with edge weights proportional to evidence contribution.

Complex or ambiguous claims are represented as triplet graphs with placeholders; VeGraph iteratively resolves ambiguous entities via LLM-driven KB queries, updating the memory bank after each resolution step, and logging supporting evidence for every disambiguation and verification action.

A precomputed corpus-scale entity graph is used as a persistent memory bank. Given a new claim, a subgraph $G_c$ is extracted using entity linking and bounded-hop traversal. Each subgraph is a claim-specific working memory for aggregating, ranking, and feeding contextually relevant evidence frames to downstream inferential modules.

3. Scoring, Attribution, and Evidence Metrics

Memory banks incorporate explicit mechanisms for quantifying the strength and sufficiency of evidence.

ClaimVer assigns:
- $cs(y_i)$ : A categorical claim label score depending on LLM prediction.
- $\mathrm{TMS}_i$ : Combines semantic (cosine embedding similarity) and entity overlap between claim and evidence.
- Global KAS: $S = \sum_i TMS_i\cdot cs(y_i)$ , mapped through an asymmetric sigmoid to penalize negative aggregate evidence.
ADORE computes per-section and global "coverage":

$\mathrm{coverage}(s) = \frac{|\{c: \mathrm{sect}(c) = s,\, \exists\, e\ (e,c)\in E_G\}|}{|\{c: \mathrm{sect}(c) = s\}|}$

The memory bank enforces a stopping rule when $\mathrm{Coverage}(G) = \min_{s\in S} \mathrm{coverage}(s)$ meets a threshold $\tau$ .

In all settings, edge and node weights operationalize explainability; claims can be precisely traced to specific evidence with explicit quantitative linkage for downstream audit or report generation.

4. Persistent Memory Bank, Indexing, and Traceability

Unlike ephemeral retrieval pipelines, Memory Banks are by design auditable, persistent, and queryable.

Each evidence item is indexed both by content and by linkage to claims (and, if present, higher-level contexts or sections). E.g., in Hindsight (Latimer et al., 14 Dec 2025), every belief node tracks its support set $\mathrm{Supp}(b)$ , along with edge weights and timestamps, so reflection and belief revision steps are traceable.
All LLM queries, rationales, evidence documents, and entity resolutions are logged (VeGraph), forming an enduring explanation artifact.
Memory bank graphs are stored in high-throughput or sharded key–value stores and dynamically updated as new documents are ingested or new claims are posed (Mongiovì et al., 2021).

5. Applications in Fact Verification, Agent Memory, and RAG

Memory Banks instantiated as Claim–Evidence Graphs address critical requirements in multiple domains:

Fact Checking and Claim Verification: ClaimVer, VeGraph, and entity-graph approaches make evidence structure explicit, support fine-grained attributions, and expose model reasoning and evidence coverage gaps directly to end users (Dammu et al., 2024, Pham et al., 29 May 2025, Mongiovì et al., 2021).
Enterprise and Multi-Agent Report Generation: ADORE orchestrates specialized agents around a locked memory bank, enforcing traceable and auditable synthesis with systematic evidence completeness checking, especially under long-context and document overflow regimes (You et al., 26 Jan 2026).
Agent Long-Term Memory: Hindsight demonstrates that structured memory banks provide epistemic clarity, dynamic belief revision, and drastically improved long-horizon accuracy compared to flat vector/RAG stores (Latimer et al., 14 Dec 2025).

Memory banks are thus essential both as transient working graphs in verification loops and as persistent stores underpinning explainable, extensible, and scalable AI behaviors.

6. Limitations and Extensions

Current memory bank paradigms exhibit limitations:

Semantic Role Ignorance: Pure co-occurrence graphs may dilute relational semantics, motivating typed/multi-relation extensions (Mongiovì et al., 2021).
Entity Disambiguation: Effective reasoning requires robust, explainable entity resolution, as demonstrated by systematic, LLM-in-the-loop disambiguation protocols (VeGraph).
Scalability: Memory banks built over massive corpora necessitate distributed graph stores or aggressive sparsification.
Attribution Fuzziness: Many frameworks assume orthogonality between claims and evidence granularity; more sophisticated segment alignment or edge-type distinctions (support, contradiction, extrapolation) offer opportunities for increased precision.

This suggests ongoing research is focused on richer relation typing, improved scaling, cross-granular evidence-claim alignment, and high-precision, fully auditable reasoning.