LiCoMemory: Hierarchical Memory for LLMs

Updated 10 November 2025

LiCoMemory is an end-to-end memory framework that uses a hierarchical CogniGraph to enhance LLM reasoning and maintain multi-session consistency.
It employs a real-time agentic memory controller to efficiently update and retrieve contextual data for improved dialogue and reasoning benchmarks.
Empirical results show LiCoMemory outperforms traditional memory systems with higher accuracy and reduced latency across multiple evaluation benchmarks.

LiCoMemory is an end-to-end agentic memory framework designed to address the persistent memory limitations in LLM agents. By interposing a lightweight, hierarchical external memory (CogniGraph) and real-time controller between users and LLMs, LiCoMemory enables efficient long-term reasoning, multi-session consistency, and improved retrieval accuracy. This architecture outperforms existing graph-based and flat memory systems on key dialogue and reasoning benchmarks while reducing retrieval and update latencies.

1. System Architecture

LiCoMemory is structured as a modular agentic memory system for LLM-based agents, comprising two primary components:

CogniGraph: A lightweight hierarchical graph index serving as external agentic memory.
Agentic Memory Controller: Orchestrates the flow between user queries, CogniGraph retrievals/updates, prompt assembly, and LLM invocation.

On every user-agent interaction, the controller extracts entities and temporal cues from the user input, requests contextual retrievals from CogniGraph (session summaries, triples, and chunks), assembles the context-rich prompt for the LLM, and subsequently updates CogniGraph with the dialogue chunk, maintaining both summary and semantic layers. Updates and retrievals operate on a unified in-memory index, allowing for the immediate integration of new information.

2. Hierarchical CogniGraph Representation

CogniGraph models external memory as a three-layer, directed graph $G = (V, E)$ , where:

Session Nodes $\mathbf{V_{session}}$ : Store a summary $s_j$ , distilled keywords $K_j$ , and timestamp $\tau_j$ .
Triple Nodes $\mathbf{V_{entity}}$ : Represent (head entity, relation, tail entity) triples $t_i = (e_h, r, e_t)$ , each hyperlinked to session nodes and chunk nodes.
Chunk Nodes $\mathbf{V_{chunk}}$ : Store raw dialogue text $c_k$ and timestamp $\tau_k$ .

Edges $E$ connect session nodes to triple nodes (extracted for a session) and triple nodes to chunk nodes (extracted from dialogue chunks). No intra-layer edges are present, minimizing index redundancy and promoting efficient retrieval.

Formal Construction

Let:

$V_{session} = \{s_j \mid j=1\dots N_{sessions}\}$
$V_{entity} = \{t_i \mid i=1\dots N_{triples}\}$
$V_{chunk} = \{c_k \mid k=1\dots N_{chunks}\}$

Triple nodes $t_i$ carry timestamps $\tau_i$ and maintain cross-layer links:

$link_{session}: V_{entity} \rightarrow V_{session}$
$link_{chunk}: V_{entity} \rightarrow \mathcal{P}(V_{chunk})$

Update Pseudocode

Initialize CogniGraph G = (V_session={}, V_entity={}, V_chunk={}, E={})

def process_chunk(chunk_text, session_id, timestamp):
    # 1. Session-level summary
    if session_id in V_session:
        s = V_session[session_id]
        s.summary = update_summary(s.summary, chunk_text)
        s.keys    = update_keywords(s.keys, chunk_text)
        s.timestamp = timestamp
    else:
        s = SessionNode(id=session_id, summary=make_summary(chunk_text),
                        keys=extract_keywords(chunk_text), timestamp=timestamp)
        V_session.add(s)

    # 2. Triple extraction and deduplication
    triples = extract_triples(chunk_text)
    for tri in triples:
        if not exists_similar_triple(tri):
            t_node = TripleNode(tri, timestamp)
            V_entity.add(t_node)
            E.add((s, t_node))
            c_node = ChunkNode(chunk_text, timestamp)
            V_chunk.add(c_node)
            E.add((t_node, c_node))
        else:
            t_exist = find_similar_triple(tri)
            c_node = ChunkNode(chunk_text, timestamp)
            V_chunk.add(c_node)
            E.add((t_exist, c_node))

Triple deduplication leverages type-aware and semantic similarity matching, ensuring memory compactness and reducing redundant fact storage.

3. Real-Time Update and Hierarchical Retrieval Mechanisms

Update Complexity

For each incoming chunk:

Session node update: $O(1)$ lookup/insert (hash table)
Triple extraction/deduplication: $O(M)$ operations ( $M$ : triples per chunk)
Similarity checks: $O(M \cdot \log N)$ (approximate nearest neighbor indexing among $N$ triples)

Empirically, single-chunk update latency is approximately $T_G \approx 5.2$ seconds on A100 GPUs, with token consumption per session $K_G \approx 13.5$ k.

Retrieval Workflow

LiCoMemory's retrieval pipeline is hierarchy- and temporal-aware:

Entity Extraction: $Q_{entities} = \{e^q_1, ..., e^q_m\}$ from the query.
Session-Level Ranking: Compute semantic similarity $S_s(j) = sim_{emb}(K_j, Q_{entities})$ ; select top- $K_s$ sessions.
Triple-Level Scoring: For candidate session $s_j$ and each neighbor triple $t_i$ , $S_t(i) = sim_{emb}((e_h, e_t, r)_i, Q_{entities})$ .
Integrated Relevance Scoring: Harmonic mean

$S_{sem}(i) = \frac{2 \cdot S_s(j) \cdot S_t(i)}{S_s(j) + S_t(i)}$

Temporal weighting

$w(\Delta \tau_i) = \exp\left[ -\left( \frac{\Delta \tau_i}{\hat{\tau}} \right)^k \right],\quad 0 < k < 1$

Final score

$R(i) = S_{sem}(i) \cdot w(\Delta \tau_i)$

Reranking and Prompt Construction: Sort triples by $R(i)$ , fetch linked session summaries and chunks, and select top- $N$ units for the LLM prompt.

Reranking Algorithm Pseudocode

def retrieve_and_rerank(query):
    Q_ents = extract_entities(query)
    for s_j in V_session:
        S_s[j] = semantic_sim(K_j, Q_ents)
    S_s_sorted = topK_sessions(S_s)

    candidates = []
    for s in S_s_sorted:
        for t in neighbors(s):
            S_t = semantic_sim(triple_repr(t), Q_ents)
            delta_tau = abs(now - t.timestamp)
            candidates.append((s, t, S_s[s], S_t, delta_tau))

    tau_hat = median([delta_tau for _,_,_,_,delta_tau in candidates])
    for (s, t, S_s_, S_t_, delta_tau) in candidates:
        S_sem = 2 * S_s_ * S_t_ / (S_s_ + S_t_)
        w     = exp(- (delta_tau / tau_hat) ** k)
        R     = S_sem * w
        store(score=R, triple=t, session=s)

    best = topN_by_score(candidates)
    return assemble_prompt(best)

4. Empirical Evaluation

LiCoMemory was quantitatively assessed on the LongMemEval and LoCoMo benchmarks, testing multi-session, single-hop, multi-hop, temporal, open-domain, and adversarial reasoning.

Performance Comparison Table

Method	LongMemEval Acc.	Rec.@15	T_R	LoCoMo Acc.	Rec.@15	T_R
LoCoMo (RAG)	17.6 %	22.0 %	4.9s	23.6 %	25.5 %	4.9s
Mem0	56.8 %	61.2 %	2.7s	53.2 %	57.1 %	3.3s
A-MEM	57.4 %	62.2 %	3.0s	43.8 %	49.2 %	3.1s
Zep	60.2 %	62.7 %	2.8s	40.3 %	51.1 %	3.5s
LiCoMemory	69.2 %	72.4 %	2.6s	63.0 %	64.5 %	2.4s

LiCoMemory achieves an absolute improvement of +9.0 pp accuracy over the second-best baseline (Zep) and exhibits lowest observed query latency.

Ablation Insights

Removing hierarchical retrieval leads to a –22 pp accuracy reduction.
Omitting temporal weighting yields –22 pp on temporal QA.
Excluding session-level summaries causes –12 pp overall accuracy drop.

Real-Time Simulation

Chunk-wise insertion on LongMemEval demonstrates:

Update latency $T_G \approx 5.21$ s/session
Retrieval latency $T_R \approx 2.63$ s/query
Token usage: $K_G \approx 13.52$ k/session, $K_R \approx 1.22$ k/query
QA accuracy is maintained at 67.4% (only 1.8 pp less relative to static full-history context).

A plausible implication is that LiCoMemory maintains robust performance even under incremental, streaming update scenarios.

5. Applications and Limitations

LiCoMemory functions as a cognitive scaffold for LLM agents, supporting:

Multi-session consistency via session-level summaries.
Temporal query handling through decay weighting.
Grounding specific facts with hyperlinks to raw source chunks.

In practical deployment, LiCoMemory enables agents to adaptively retrieve and update persistent knowledge, yielding coherent long-term dialogues and improving agentic reasoning capacity.

Limitations include linear memory growth with session volume and the single-agent design of CogniGraph. Future work may address multi-agent knowledge graph sharing, adaptive compression of historical data, and improved learned summary abstraction. Extraction and summarization fidelity remain as bottlenecks for maximal retrieval precision.

6. Significance in Agentic Long-Term Reasoning

LiCoMemory demonstrates that a semantically indexed, lightweight hierarchical graph—in combination with temporal and hierarchy-aware retrieval mechanisms—substantially surpasses previous external memory architectures in both efficiency and reasoning accuracy. These design principles suggest a path forward for scalable, persistent, agentic memory solutions compatible with real-time LLM deployments.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to LiCoMemory.