Papers
Topics
Authors
Recent
2000 character limit reached

LiCoMemory: Hierarchical Memory for LLMs

Updated 10 November 2025
  • LiCoMemory is an end-to-end memory framework that uses a hierarchical CogniGraph to enhance LLM reasoning and maintain multi-session consistency.
  • It employs a real-time agentic memory controller to efficiently update and retrieve contextual data for improved dialogue and reasoning benchmarks.
  • Empirical results show LiCoMemory outperforms traditional memory systems with higher accuracy and reduced latency across multiple evaluation benchmarks.

LiCoMemory is an end-to-end agentic memory framework designed to address the persistent memory limitations in LLM agents. By interposing a lightweight, hierarchical external memory (CogniGraph) and real-time controller between users and LLMs, LiCoMemory enables efficient long-term reasoning, multi-session consistency, and improved retrieval accuracy. This architecture outperforms existing graph-based and flat memory systems on key dialogue and reasoning benchmarks while reducing retrieval and update latencies.

1. System Architecture

LiCoMemory is structured as a modular agentic memory system for LLM-based agents, comprising two primary components:

  1. CogniGraph: A lightweight hierarchical graph index serving as external agentic memory.
  2. Agentic Memory Controller: Orchestrates the flow between user queries, CogniGraph retrievals/updates, prompt assembly, and LLM invocation.

On every user-agent interaction, the controller extracts entities and temporal cues from the user input, requests contextual retrievals from CogniGraph (session summaries, triples, and chunks), assembles the context-rich prompt for the LLM, and subsequently updates CogniGraph with the dialogue chunk, maintaining both summary and semantic layers. Updates and retrievals operate on a unified in-memory index, allowing for the immediate integration of new information.

2. Hierarchical CogniGraph Representation

CogniGraph models external memory as a three-layer, directed graph G=(V,E)G = (V, E), where:

  • Session Nodes Vsession\mathbf{V_{session}}: Store a summary sjs_j, distilled keywords KjK_j, and timestamp τj\tau_j.
  • Triple Nodes Ventity\mathbf{V_{entity}}: Represent (head entity, relation, tail entity) triples ti=(eh,r,et)t_i = (e_h, r, e_t), each hyperlinked to session nodes and chunk nodes.
  • Chunk Nodes Vchunk\mathbf{V_{chunk}}: Store raw dialogue text ckc_k and timestamp τk\tau_k.

Edges EE connect session nodes to triple nodes (extracted for a session) and triple nodes to chunk nodes (extracted from dialogue chunks). No intra-layer edges are present, minimizing index redundancy and promoting efficient retrieval.

Formal Construction

Let:

  • Vsession={sjj=1Nsessions}V_{session} = \{s_j \mid j=1\dots N_{sessions}\}
  • Ventity={tii=1Ntriples}V_{entity} = \{t_i \mid i=1\dots N_{triples}\}
  • Vchunk={ckk=1Nchunks}V_{chunk} = \{c_k \mid k=1\dots N_{chunks}\}

Triple nodes tit_i carry timestamps τi\tau_i and maintain cross-layer links:

  • linksession:VentityVsessionlink_{session}: V_{entity} \rightarrow V_{session}
  • linkchunk:VentityP(Vchunk)link_{chunk}: V_{entity} \rightarrow \mathcal{P}(V_{chunk})

Update Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Initialize CogniGraph G = (V_session={}, V_entity={}, V_chunk={}, E={})

def process_chunk(chunk_text, session_id, timestamp):
    # 1. Session-level summary
    if session_id in V_session:
        s = V_session[session_id]
        s.summary = update_summary(s.summary, chunk_text)
        s.keys    = update_keywords(s.keys, chunk_text)
        s.timestamp = timestamp
    else:
        s = SessionNode(id=session_id, summary=make_summary(chunk_text),
                        keys=extract_keywords(chunk_text), timestamp=timestamp)
        V_session.add(s)

    # 2. Triple extraction and deduplication
    triples = extract_triples(chunk_text)
    for tri in triples:
        if not exists_similar_triple(tri):
            t_node = TripleNode(tri, timestamp)
            V_entity.add(t_node)
            E.add((s, t_node))
            c_node = ChunkNode(chunk_text, timestamp)
            V_chunk.add(c_node)
            E.add((t_node, c_node))
        else:
            t_exist = find_similar_triple(tri)
            c_node = ChunkNode(chunk_text, timestamp)
            V_chunk.add(c_node)
            E.add((t_exist, c_node))

Triple deduplication leverages type-aware and semantic similarity matching, ensuring memory compactness and reducing redundant fact storage.

3. Real-Time Update and Hierarchical Retrieval Mechanisms

Update Complexity

For each incoming chunk:

  • Session node update: O(1)O(1) lookup/insert (hash table)
  • Triple extraction/deduplication: O(M)O(M) operations (MM: triples per chunk)
  • Similarity checks: O(MlogN)O(M \cdot \log N) (approximate nearest neighbor indexing among NN triples)

Empirically, single-chunk update latency is approximately TG5.2T_G \approx 5.2 seconds on A100 GPUs, with token consumption per session KG13.5K_G \approx 13.5k.

Retrieval Workflow

LiCoMemory's retrieval pipeline is hierarchy- and temporal-aware:

  1. Entity Extraction: Qentities={e1q,...,emq}Q_{entities} = \{e^q_1, ..., e^q_m\} from the query.
  2. Session-Level Ranking: Compute semantic similarity Ss(j)=simemb(Kj,Qentities)S_s(j) = sim_{emb}(K_j, Q_{entities}); select top-KsK_s sessions.
  3. Triple-Level Scoring: For candidate session sjs_j and each neighbor triple tit_i, St(i)=simemb((eh,et,r)i,Qentities)S_t(i) = sim_{emb}((e_h, e_t, r)_i, Q_{entities}).
  4. Integrated Relevance Scoring: Harmonic mean

Ssem(i)=2Ss(j)St(i)Ss(j)+St(i)S_{sem}(i) = \frac{2 \cdot S_s(j) \cdot S_t(i)}{S_s(j) + S_t(i)}

Temporal weighting

w(Δτi)=exp[(Δτiτ^)k],0<k<1w(\Delta \tau_i) = \exp\left[ -\left( \frac{\Delta \tau_i}{\hat{\tau}} \right)^k \right],\quad 0 < k < 1

Final score

R(i)=Ssem(i)w(Δτi)R(i) = S_{sem}(i) \cdot w(\Delta \tau_i)

  1. Reranking and Prompt Construction: Sort triples by R(i)R(i), fetch linked session summaries and chunks, and select top-NN units for the LLM prompt.

Reranking Algorithm Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def retrieve_and_rerank(query):
    Q_ents = extract_entities(query)
    for s_j in V_session:
        S_s[j] = semantic_sim(K_j, Q_ents)
    S_s_sorted = topK_sessions(S_s)

    candidates = []
    for s in S_s_sorted:
        for t in neighbors(s):
            S_t = semantic_sim(triple_repr(t), Q_ents)
            delta_tau = abs(now - t.timestamp)
            candidates.append((s, t, S_s[s], S_t, delta_tau))

    tau_hat = median([delta_tau for _,_,_,_,delta_tau in candidates])
    for (s, t, S_s_, S_t_, delta_tau) in candidates:
        S_sem = 2 * S_s_ * S_t_ / (S_s_ + S_t_)
        w     = exp(- (delta_tau / tau_hat) ** k)
        R     = S_sem * w
        store(score=R, triple=t, session=s)

    best = topN_by_score(candidates)
    return assemble_prompt(best)

4. Empirical Evaluation

LiCoMemory was quantitatively assessed on the LongMemEval and LoCoMo benchmarks, testing multi-session, single-hop, multi-hop, temporal, open-domain, and adversarial reasoning.

Performance Comparison Table

Method LongMemEval Acc. Rec.@15 T_R LoCoMo Acc. Rec.@15 T_R
LoCoMo (RAG) 17.6 % 22.0 % 4.9s 23.6 % 25.5 % 4.9s
Mem0 56.8 % 61.2 % 2.7s 53.2 % 57.1 % 3.3s
A-MEM 57.4 % 62.2 % 3.0s 43.8 % 49.2 % 3.1s
Zep 60.2 % 62.7 % 2.8s 40.3 % 51.1 % 3.5s
LiCoMemory 69.2 % 72.4 % 2.6s 63.0 % 64.5 % 2.4s

LiCoMemory achieves an absolute improvement of +9.0 pp accuracy over the second-best baseline (Zep) and exhibits lowest observed query latency.

Ablation Insights

  • Removing hierarchical retrieval leads to a –22 pp accuracy reduction.
  • Omitting temporal weighting yields –22 pp on temporal QA.
  • Excluding session-level summaries causes –12 pp overall accuracy drop.

Real-Time Simulation

Chunk-wise insertion on LongMemEval demonstrates:

  • Update latency TG5.21T_G \approx 5.21 s/session
  • Retrieval latency TR2.63T_R \approx 2.63 s/query
  • Token usage: KG13.52K_G \approx 13.52k/session, KR1.22K_R \approx 1.22k/query
  • QA accuracy is maintained at 67.4% (only 1.8 pp less relative to static full-history context).

A plausible implication is that LiCoMemory maintains robust performance even under incremental, streaming update scenarios.

5. Applications and Limitations

LiCoMemory functions as a cognitive scaffold for LLM agents, supporting:

  • Multi-session consistency via session-level summaries.
  • Temporal query handling through decay weighting.
  • Grounding specific facts with hyperlinks to raw source chunks.

In practical deployment, LiCoMemory enables agents to adaptively retrieve and update persistent knowledge, yielding coherent long-term dialogues and improving agentic reasoning capacity.

Limitations include linear memory growth with session volume and the single-agent design of CogniGraph. Future work may address multi-agent knowledge graph sharing, adaptive compression of historical data, and improved learned summary abstraction. Extraction and summarization fidelity remain as bottlenecks for maximal retrieval precision.

6. Significance in Agentic Long-Term Reasoning

LiCoMemory demonstrates that a semantically indexed, lightweight hierarchical graph—in combination with temporal and hierarchy-aware retrieval mechanisms—substantially surpasses previous external memory architectures in both efficiency and reasoning accuracy. These design principles suggest a path forward for scalable, persistent, agentic memory solutions compatible with real-time LLM deployments.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to LiCoMemory.