Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

LiCoMemory: Hierarchical Memory for LLMs

Updated 10 November 2025
  • LiCoMemory is an end-to-end memory framework that uses a hierarchical CogniGraph to enhance LLM reasoning and maintain multi-session consistency.
  • It employs a real-time agentic memory controller to efficiently update and retrieve contextual data for improved dialogue and reasoning benchmarks.
  • Empirical results show LiCoMemory outperforms traditional memory systems with higher accuracy and reduced latency across multiple evaluation benchmarks.

LiCoMemory is an end-to-end agentic memory framework designed to address the persistent memory limitations in LLM agents. By interposing a lightweight, hierarchical external memory (CogniGraph) and real-time controller between users and LLMs, LiCoMemory enables efficient long-term reasoning, multi-session consistency, and improved retrieval accuracy. This architecture outperforms existing graph-based and flat memory systems on key dialogue and reasoning benchmarks while reducing retrieval and update latencies.

1. System Architecture

LiCoMemory is structured as a modular agentic memory system for LLM-based agents, comprising two primary components:

  1. CogniGraph: A lightweight hierarchical graph index serving as external agentic memory.
  2. Agentic Memory Controller: Orchestrates the flow between user queries, CogniGraph retrievals/updates, prompt assembly, and LLM invocation.

On every user-agent interaction, the controller extracts entities and temporal cues from the user input, requests contextual retrievals from CogniGraph (session summaries, triples, and chunks), assembles the context-rich prompt for the LLM, and subsequently updates CogniGraph with the dialogue chunk, maintaining both summary and semantic layers. Updates and retrievals operate on a unified in-memory index, allowing for the immediate integration of new information.

2. Hierarchical CogniGraph Representation

CogniGraph models external memory as a three-layer, directed graph G=(V,E)G = (V, E), where:

  • Session Nodes Vsession\mathbf{V_{session}}: Store a summary sjs_j, distilled keywords KjK_j, and timestamp τj\tau_j.
  • Triple Nodes Ventity\mathbf{V_{entity}}: Represent (head entity, relation, tail entity) triples ti=(eh,r,et)t_i = (e_h, r, e_t), each hyperlinked to session nodes and chunk nodes.
  • Chunk Nodes Vchunk\mathbf{V_{chunk}}: Store raw dialogue text ckc_k and timestamp τk\tau_k.

Edges EE connect session nodes to triple nodes (extracted for a session) and triple nodes to chunk nodes (extracted from dialogue chunks). No intra-layer edges are present, minimizing index redundancy and promoting efficient retrieval.

Formal Construction

Let:

  • Vsession={sjj=1Nsessions}V_{session} = \{s_j \mid j=1\dots N_{sessions}\}
  • Ventity={tii=1Ntriples}V_{entity} = \{t_i \mid i=1\dots N_{triples}\}
  • Vchunk={ckk=1Nchunks}V_{chunk} = \{c_k \mid k=1\dots N_{chunks}\}

Triple nodes tit_i carry timestamps τi\tau_i and maintain cross-layer links:

  • linksession:VentityVsessionlink_{session}: V_{entity} \rightarrow V_{session}
  • linkchunk:VentityP(Vchunk)link_{chunk}: V_{entity} \rightarrow \mathcal{P}(V_{chunk})

Update Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Initialize CogniGraph G = (V_session={}, V_entity={}, V_chunk={}, E={})

def process_chunk(chunk_text, session_id, timestamp):
    # 1. Session-level summary
    if session_id in V_session:
        s = V_session[session_id]
        s.summary = update_summary(s.summary, chunk_text)
        s.keys    = update_keywords(s.keys, chunk_text)
        s.timestamp = timestamp
    else:
        s = SessionNode(id=session_id, summary=make_summary(chunk_text),
                        keys=extract_keywords(chunk_text), timestamp=timestamp)
        V_session.add(s)

    # 2. Triple extraction and deduplication
    triples = extract_triples(chunk_text)
    for tri in triples:
        if not exists_similar_triple(tri):
            t_node = TripleNode(tri, timestamp)
            V_entity.add(t_node)
            E.add((s, t_node))
            c_node = ChunkNode(chunk_text, timestamp)
            V_chunk.add(c_node)
            E.add((t_node, c_node))
        else:
            t_exist = find_similar_triple(tri)
            c_node = ChunkNode(chunk_text, timestamp)
            V_chunk.add(c_node)
            E.add((t_exist, c_node))

Triple deduplication leverages type-aware and semantic similarity matching, ensuring memory compactness and reducing redundant fact storage.

3. Real-Time Update and Hierarchical Retrieval Mechanisms

Update Complexity

For each incoming chunk:

  • Session node update: O(1)O(1) lookup/insert (hash table)
  • Triple extraction/deduplication: O(M)O(M) operations (MM: triples per chunk)
  • Similarity checks: O(MlogN)O(M \cdot \log N) (approximate nearest neighbor indexing among NN triples)

Empirically, single-chunk update latency is approximately TG5.2T_G \approx 5.2 seconds on A100 GPUs, with token consumption per session KG13.5K_G \approx 13.5k.

Retrieval Workflow

LiCoMemory's retrieval pipeline is hierarchy- and temporal-aware:

  1. Entity Extraction: Qentities={e1q,...,emq}Q_{entities} = \{e^q_1, ..., e^q_m\} from the query.
  2. Session-Level Ranking: Compute semantic similarity Ss(j)=simemb(Kj,Qentities)S_s(j) = sim_{emb}(K_j, Q_{entities}); select top-KsK_s sessions.
  3. Triple-Level Scoring: For candidate session sjs_j and each neighbor triple tit_i, St(i)=simemb((eh,et,r)i,Qentities)S_t(i) = sim_{emb}((e_h, e_t, r)_i, Q_{entities}).
  4. Integrated Relevance Scoring: Harmonic mean

Ssem(i)=2Ss(j)St(i)Ss(j)+St(i)S_{sem}(i) = \frac{2 \cdot S_s(j) \cdot S_t(i)}{S_s(j) + S_t(i)}

Temporal weighting

w(Δτi)=exp[(Δτiτ^)k],0<k<1w(\Delta \tau_i) = \exp\left[ -\left( \frac{\Delta \tau_i}{\hat{\tau}} \right)^k \right],\quad 0 < k < 1

Final score

R(i)=Ssem(i)w(Δτi)R(i) = S_{sem}(i) \cdot w(\Delta \tau_i)

  1. Reranking and Prompt Construction: Sort triples by R(i)R(i), fetch linked session summaries and chunks, and select top-NN units for the LLM prompt.

Reranking Algorithm Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def retrieve_and_rerank(query):
    Q_ents = extract_entities(query)
    for s_j in V_session:
        S_s[j] = semantic_sim(K_j, Q_ents)
    S_s_sorted = topK_sessions(S_s)

    candidates = []
    for s in S_s_sorted:
        for t in neighbors(s):
            S_t = semantic_sim(triple_repr(t), Q_ents)
            delta_tau = abs(now - t.timestamp)
            candidates.append((s, t, S_s[s], S_t, delta_tau))

    tau_hat = median([delta_tau for _,_,_,_,delta_tau in candidates])
    for (s, t, S_s_, S_t_, delta_tau) in candidates:
        S_sem = 2 * S_s_ * S_t_ / (S_s_ + S_t_)
        w     = exp(- (delta_tau / tau_hat) ** k)
        R     = S_sem * w
        store(score=R, triple=t, session=s)

    best = topN_by_score(candidates)
    return assemble_prompt(best)

4. Empirical Evaluation

LiCoMemory was quantitatively assessed on the LongMemEval and LoCoMo benchmarks, testing multi-session, single-hop, multi-hop, temporal, open-domain, and adversarial reasoning.

Performance Comparison Table

Method LongMemEval Acc. Rec.@15 T_R LoCoMo Acc. Rec.@15 T_R
LoCoMo (RAG) 17.6 % 22.0 % 4.9s 23.6 % 25.5 % 4.9s
Mem0 56.8 % 61.2 % 2.7s 53.2 % 57.1 % 3.3s
A-MEM 57.4 % 62.2 % 3.0s 43.8 % 49.2 % 3.1s
Zep 60.2 % 62.7 % 2.8s 40.3 % 51.1 % 3.5s
LiCoMemory 69.2 % 72.4 % 2.6s 63.0 % 64.5 % 2.4s

LiCoMemory achieves an absolute improvement of +9.0 pp accuracy over the second-best baseline (Zep) and exhibits lowest observed query latency.

Ablation Insights

  • Removing hierarchical retrieval leads to a –22 pp accuracy reduction.
  • Omitting temporal weighting yields –22 pp on temporal QA.
  • Excluding session-level summaries causes –12 pp overall accuracy drop.

Real-Time Simulation

Chunk-wise insertion on LongMemEval demonstrates:

  • Update latency TG5.21T_G \approx 5.21 s/session
  • Retrieval latency TR2.63T_R \approx 2.63 s/query
  • Token usage: KG13.52K_G \approx 13.52k/session, KR1.22K_R \approx 1.22k/query
  • QA accuracy is maintained at 67.4% (only 1.8 pp less relative to static full-history context).

A plausible implication is that LiCoMemory maintains robust performance even under incremental, streaming update scenarios.

5. Applications and Limitations

LiCoMemory functions as a cognitive scaffold for LLM agents, supporting:

  • Multi-session consistency via session-level summaries.
  • Temporal query handling through decay weighting.
  • Grounding specific facts with hyperlinks to raw source chunks.

In practical deployment, LiCoMemory enables agents to adaptively retrieve and update persistent knowledge, yielding coherent long-term dialogues and improving agentic reasoning capacity.

Limitations include linear memory growth with session volume and the single-agent design of CogniGraph. Future work may address multi-agent knowledge graph sharing, adaptive compression of historical data, and improved learned summary abstraction. Extraction and summarization fidelity remain as bottlenecks for maximal retrieval precision.

6. Significance in Agentic Long-Term Reasoning

LiCoMemory demonstrates that a semantically indexed, lightweight hierarchical graph—in combination with temporal and hierarchy-aware retrieval mechanisms—substantially surpasses previous external memory architectures in both efficiency and reasoning accuracy. These design principles suggest a path forward for scalable, persistent, agentic memory solutions compatible with real-time LLM deployments.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to LiCoMemory.