LiCoMemory: Hierarchical Memory for LLMs
- LiCoMemory is an end-to-end memory framework that uses a hierarchical CogniGraph to enhance LLM reasoning and maintain multi-session consistency.
- It employs a real-time agentic memory controller to efficiently update and retrieve contextual data for improved dialogue and reasoning benchmarks.
- Empirical results show LiCoMemory outperforms traditional memory systems with higher accuracy and reduced latency across multiple evaluation benchmarks.
LiCoMemory is an end-to-end agentic memory framework designed to address the persistent memory limitations in LLM agents. By interposing a lightweight, hierarchical external memory (CogniGraph) and real-time controller between users and LLMs, LiCoMemory enables efficient long-term reasoning, multi-session consistency, and improved retrieval accuracy. This architecture outperforms existing graph-based and flat memory systems on key dialogue and reasoning benchmarks while reducing retrieval and update latencies.
1. System Architecture
LiCoMemory is structured as a modular agentic memory system for LLM-based agents, comprising two primary components:
- CogniGraph: A lightweight hierarchical graph index serving as external agentic memory.
- Agentic Memory Controller: Orchestrates the flow between user queries, CogniGraph retrievals/updates, prompt assembly, and LLM invocation.
On every user-agent interaction, the controller extracts entities and temporal cues from the user input, requests contextual retrievals from CogniGraph (session summaries, triples, and chunks), assembles the context-rich prompt for the LLM, and subsequently updates CogniGraph with the dialogue chunk, maintaining both summary and semantic layers. Updates and retrievals operate on a unified in-memory index, allowing for the immediate integration of new information.
2. Hierarchical CogniGraph Representation
CogniGraph models external memory as a three-layer, directed graph , where:
- Session Nodes : Store a summary , distilled keywords , and timestamp .
- Triple Nodes : Represent (head entity, relation, tail entity) triples , each hyperlinked to session nodes and chunk nodes.
- Chunk Nodes : Store raw dialogue text and timestamp .
Edges connect session nodes to triple nodes (extracted for a session) and triple nodes to chunk nodes (extracted from dialogue chunks). No intra-layer edges are present, minimizing index redundancy and promoting efficient retrieval.
Formal Construction
Let:
Triple nodes carry timestamps and maintain cross-layer links:
Update Pseudocode
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Initialize CogniGraph G = (V_session={}, V_entity={}, V_chunk={}, E={})
def process_chunk(chunk_text, session_id, timestamp):
# 1. Session-level summary
if session_id in V_session:
s = V_session[session_id]
s.summary = update_summary(s.summary, chunk_text)
s.keys = update_keywords(s.keys, chunk_text)
s.timestamp = timestamp
else:
s = SessionNode(id=session_id, summary=make_summary(chunk_text),
keys=extract_keywords(chunk_text), timestamp=timestamp)
V_session.add(s)
# 2. Triple extraction and deduplication
triples = extract_triples(chunk_text)
for tri in triples:
if not exists_similar_triple(tri):
t_node = TripleNode(tri, timestamp)
V_entity.add(t_node)
E.add((s, t_node))
c_node = ChunkNode(chunk_text, timestamp)
V_chunk.add(c_node)
E.add((t_node, c_node))
else:
t_exist = find_similar_triple(tri)
c_node = ChunkNode(chunk_text, timestamp)
V_chunk.add(c_node)
E.add((t_exist, c_node)) |
Triple deduplication leverages type-aware and semantic similarity matching, ensuring memory compactness and reducing redundant fact storage.
3. Real-Time Update and Hierarchical Retrieval Mechanisms
Update Complexity
For each incoming chunk:
- Session node update: lookup/insert (hash table)
- Triple extraction/deduplication: operations (: triples per chunk)
- Similarity checks: (approximate nearest neighbor indexing among triples)
Empirically, single-chunk update latency is approximately seconds on A100 GPUs, with token consumption per session k.
Retrieval Workflow
LiCoMemory's retrieval pipeline is hierarchy- and temporal-aware:
- Entity Extraction: from the query.
- Session-Level Ranking: Compute semantic similarity ; select top- sessions.
- Triple-Level Scoring: For candidate session and each neighbor triple , .
- Integrated Relevance Scoring: Harmonic mean
Temporal weighting
Final score
- Reranking and Prompt Construction: Sort triples by , fetch linked session summaries and chunks, and select top- units for the LLM prompt.
Reranking Algorithm Pseudocode
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
def retrieve_and_rerank(query): Q_ents = extract_entities(query) for s_j in V_session: S_s[j] = semantic_sim(K_j, Q_ents) S_s_sorted = topK_sessions(S_s) candidates = [] for s in S_s_sorted: for t in neighbors(s): S_t = semantic_sim(triple_repr(t), Q_ents) delta_tau = abs(now - t.timestamp) candidates.append((s, t, S_s[s], S_t, delta_tau)) tau_hat = median([delta_tau for _,_,_,_,delta_tau in candidates]) for (s, t, S_s_, S_t_, delta_tau) in candidates: S_sem = 2 * S_s_ * S_t_ / (S_s_ + S_t_) w = exp(- (delta_tau / tau_hat) ** k) R = S_sem * w store(score=R, triple=t, session=s) best = topN_by_score(candidates) return assemble_prompt(best) |
4. Empirical Evaluation
LiCoMemory was quantitatively assessed on the LongMemEval and LoCoMo benchmarks, testing multi-session, single-hop, multi-hop, temporal, open-domain, and adversarial reasoning.
Performance Comparison Table
| Method | LongMemEval Acc. | Rec.@15 | T_R | LoCoMo Acc. | Rec.@15 | T_R |
|---|---|---|---|---|---|---|
| LoCoMo (RAG) | 17.6 % | 22.0 % | 4.9s | 23.6 % | 25.5 % | 4.9s |
| Mem0 | 56.8 % | 61.2 % | 2.7s | 53.2 % | 57.1 % | 3.3s |
| A-MEM | 57.4 % | 62.2 % | 3.0s | 43.8 % | 49.2 % | 3.1s |
| Zep | 60.2 % | 62.7 % | 2.8s | 40.3 % | 51.1 % | 3.5s |
| LiCoMemory | 69.2 % | 72.4 % | 2.6s | 63.0 % | 64.5 % | 2.4s |
LiCoMemory achieves an absolute improvement of +9.0 pp accuracy over the second-best baseline (Zep) and exhibits lowest observed query latency.
Ablation Insights
- Removing hierarchical retrieval leads to a –22 pp accuracy reduction.
- Omitting temporal weighting yields –22 pp on temporal QA.
- Excluding session-level summaries causes –12 pp overall accuracy drop.
Real-Time Simulation
Chunk-wise insertion on LongMemEval demonstrates:
- Update latency s/session
- Retrieval latency s/query
- Token usage: k/session, k/query
- QA accuracy is maintained at 67.4% (only 1.8 pp less relative to static full-history context).
A plausible implication is that LiCoMemory maintains robust performance even under incremental, streaming update scenarios.
5. Applications and Limitations
LiCoMemory functions as a cognitive scaffold for LLM agents, supporting:
- Multi-session consistency via session-level summaries.
- Temporal query handling through decay weighting.
- Grounding specific facts with hyperlinks to raw source chunks.
In practical deployment, LiCoMemory enables agents to adaptively retrieve and update persistent knowledge, yielding coherent long-term dialogues and improving agentic reasoning capacity.
Limitations include linear memory growth with session volume and the single-agent design of CogniGraph. Future work may address multi-agent knowledge graph sharing, adaptive compression of historical data, and improved learned summary abstraction. Extraction and summarization fidelity remain as bottlenecks for maximal retrieval precision.
6. Significance in Agentic Long-Term Reasoning
LiCoMemory demonstrates that a semantically indexed, lightweight hierarchical graph—in combination with temporal and hierarchy-aware retrieval mechanisms—substantially surpasses previous external memory architectures in both efficiency and reasoning accuracy. These design principles suggest a path forward for scalable, persistent, agentic memory solutions compatible with real-time LLM deployments.