CogniGraph: Hierarchical Graph Modeling

Updated 10 November 2025

CogniGraph is a dual-framework model that uses hierarchical graphs to encode spatial and semantic relationships in both neurocognitive assessment and agentic memory systems.
In neurocognitive applications, it transforms narrative transcripts into spatio-semantic graphs and computes metrics such as total path distance to distinguish cognitive states.
In agentic memory, its three-layer indexing architecture supports efficient retrieval, temporal reasoning, and deduplication in processing conversational data.

CogniGraph refers to two distinct yet technically notable hierarchical graph modeling frameworks appearing in recent literature: one for automated extraction and quantitative analysis of spatio-semantic narrative paths in neurocognitive assessment (Ng et al., 2 Feb 2025), and another—deployed in agentic long-term memory systems—as a three-level, entity- and relation-indexing architecture for efficient semantic retrieval and updating (Huang et al., 3 Nov 2025). Both share the unifying principle of using graph constructs—nodes, edges, and semantic or spatial attributes—to encode, traverse, and quantify relationships in complex, temporally or spatially extended data, but diverge in their design goals, formal structures, and domains of application.

1. CogniGraph in Cognitive-Linguistic Assessment

Pipeline Overview

In neurocognitive analysis, CogniGraph designates an automated computational pipeline that builds spatio-semantic graphs from transcribed picture descriptions, abstracting the narrative "visual path" traced by a speaker over structured image content (Ng et al., 2 Feb 2025). The pipeline is composed of the following principal steps:

Automatic CIU Tagging: Preprocess CHAT-formatted transcripts by removing punctuation and lemmatizing tokens via spaCy (v3); map each lemma to a CIU (content information unit) using a fixed dictionary of 23 labels (e.g., "Boy," "Water overflowing"). Ambiguous tokens use first-match priority; unmatched tokens are dropped.
Graph Construction: Assign each CIU to a predefined 2-D coordinate on the stimulus image (e.g., the Cookie Theft picture, $546\times290$ pixels). Construct a directed path graph where nodes $V$ are the sequence of mentioned CIUs, each with coordinate $(x_v, y_v)$ , and edges $E$ connect consecutive CIUs—the sequence order reflecting the narrative path.
Edge Weights: Define edge weight $w(e_i)$ as the Euclidean distance between successive CIUs:

$w(e_i) = \sqrt{(x_{v_i} - x_{v_{i+1}})^2 + (y_{v_i} - y_{v_{i+1}})^2}$

A variant collapses spatial assignment to quadrants ( $Q_1...Q_4$ ), forming unweighted transitions in quadrant space.

Feature Extraction: Compute 12 interpretable graph metrics (see Table 1 below), quantifying traversal efficiency, path structure, and repetition.

Table 1: CogniGraph Spatio-Semantic Metrics

Feature	Mathematical Definition	Clinical Relevance
Avg. X / Avg. Y	$\frac{1}{\|V\|} \sum_{v} x_v$ / analogous for $y_v$	Fiscal focus/localization
Std. X / Std. Y	Std. dev. over $x_v$ , $y_v$	Dispersion/coverage
Total path distance ( $D_{\rm tot}$ )	$\sum_{e \in E} w(e)$	Narrative traversability
Unique nodes	$\|V\|$	Content richness
Total path / Unique nodes	$D_{\rm tot}/\|V\|$	Efficiency
Nodes (length)	$N$	Verbal output
Self cycles / cycles	$\|\{i: v_i=v_{i+1}\}\|$ , $\|\{i<j: v_i=v_j\}\|$	Repetitiveness
Quadrant transitions	Counted in quadrant graph variant	Spatial transition pattern
Cross ratio (quadrants)	$\frac{\#(q_i \neq q_{i+1})}{\#(q_i = q_{i+1})}$	Cross-region switching

Statistical analysis via ANCOVA on these features—using clinical group (cognitively unimpaired vs. impaired) as the between-groups factor, with covariates for age, education, gender, and unique nodes—demonstrates that automated CogniGraph features robustly differentiate impaired from unimpaired speakers. Notably, automated CIU tagging yields larger group-wise F-values than manual scoring on path-length and cycle metrics (e.g., Total path: $F=39.5$ automated vs. $F=19.6$ manual; Cycles: $F=45.6$ vs. $F=30.7$ ).

2. Hierarchical Graph Design in Agentic Memory Systems

In long-term agentic memory, CogniGraph is formalized as a three-layer, directed, hyperlink-based semantic indexing graph that structures information along session, entity–relation, and chunk axes (Huang et al., 3 Nov 2025). Its core motivation is the disentanglement of semantic granularity and hierarchical organization for efficient retrieval, deduplication, and temporally aware reasoning in LLM agents.

Layered Structure

Session layer: Nodes represent entire conversation sessions, annotated with concise summaries, keyword sets, and time spans.
Entity–relation layer: A lightweight knowledge graph, representing entity nodes and directed relation edges (triples, $(h, r, t)$ ), each triple linked to its session(s) and originating chunk(s), uniquely identified and timestamped.
Chunk layer: The most granular level, with nodes for individual dialogue chunks (verbatim text), each linked to the triples it expresses.

Formally, $G=(S \cup E \cup C, R \cup H)$ where $S$ (sessions), $E$ (entities), $C$ (chunks), $R$ (triples), $H$ (hyperlinks across layers).

Data structures mirror this design: Python-inspired SessionNode, TripleNode, ChunkNode, and a CogniGraph manager, each incorporating cross-links for fast navigation and updating.

3. Insertion, Deduplication, and Update Algorithms

Automatic maintenance of CogniGraph proceeds through incremental, chunk-wise updates:

Session Update: On detecting a new session, create a new summary and key set; else, refine the existing summary and update keys (typically via LLM summarization and key phrase extraction).
Chunk Insertion: New dialogue chunk is inserted as a ChunkNode, timestamped and session-linked.
Triple Extraction: Open Information Extraction (OpenIE) or LLMs produce a set of candidate triples per chunk.
Triple Deduplication: Each triple is hashed; existing triples are linked to additional sessions/chunks if observed again, else a fresh TripleNode is instantiated.
Cross-Linking: All new or existing triple IDs are linked to the session and chunk; this underpins rapid semantic traversal and pruning by context.

Typical insertion/update cost per chunk is $O(p \cdot \log |E| + L_{summ} + L_{text})$ , with $p$ the number of triples extracted.

4. Retrieval and Reranking: Temporal and Hierarchy-Aware Search

CogniGraph retrieval is structured as a hierarchical, semantically-aware search combined with temporal decay reranking.

Retrieval Stages

A. Query entity extraction: Derive entity set $Q_E$ from the natural-language query.

B. Session-level ranking: Compute semantic similarity $S_s = \mathrm{sim}_{\mathrm{sem}}(Q_E, s.\mathrm{keys})$ for all sessions; select top-ranked sessions.

C. Triple-level ranking and temporal weighting: - For each candidate triple, compute $S_t = \mathrm{sim}_{\mathrm{sem}}(Q_E, \{h,r,t\})$ . - Aggregated relevance is the harmonic mean:

$S_{\mathrm{sem}} = \frac{2\,S_s\,S_t}{S_s + S_t}$

Apply temporal decay with Weibull function:

$w(\Delta\tau) = \exp\!\left[-\left(\frac{\Delta\tau}{\hat{\tau}}\right)^k\right],\quad 0<k<1$
Overall triple reranking score:

$R(t) = S_{\mathrm{sem}} \times w(\Delta\tau)$

D. Chunk assembly: For top- $K$ triples, retrieve linked chunk text; return a subgraph composing session summaries, high-scoring triples, and raw chunks, ordered by $R(t)$ .

Complexity is $O(|S|+\sum_{s}|linked\_triples_s|)$ , with empirical tractability via top- $N$ pruning.

Worked Example

For the query “How many followers did I gain on SocialX last month?”: entity extraction yields $\{$ SocialX, followers, last month $\}$ . Only sessions with overlapping keys survive session ranking, then triples are weighted and reranked per the above equations. The final output is a subgraph passed to the LLM for natural-language answer synthesis, e.g., “You gained 1200 followers last month.”

5. Comparative Impact and Empirical Findings

In clinical application, CogniGraph’s automated spatio-semantic features—when benchmarked using large multimodal datasets such as WRAP and DementiaBank—show significant group separation on path-based and repetition metrics, with larger F-values than manual methods in ANCOVA. Impaired speakers display longer traversal distances, more cycles, and less systematic scan paths. This suggests the approach is suitable for robust feature extraction in computational models of cognitive impairment.

In agentic memory, the use of CogniGraph within LiCoMemory provides superior temporal reasoning, multi-session consistency, and retrieval efficiency compared to baselines in standard benchmarks (LoCoMo, LongMemEval), with the hierarchy decoupling semantics from topology to avoid redundancy and improve update latency and indexing accuracy.

6. Limitations and Future Directions

Key limitations in the clinical pipeline stem from dependence on a hand-crafted CIU dictionary, potentially missing synonym or paraphrase coverage and failing for nonliteral expression; only group-level statistics, not end-to-end classifier models, are demonstrated. In agentic memory, the minimalism of node content ensures speed but offloads nuanced reasoning (e.g., paraphrase detection, entity coherence) to downstream LLMs or reranking layers.

Potential directions for extension include replacing dictionary-based CIU tagging with large-language-model (LLM) methods to improve coverage and flexibility, and integrating spatio-semantic features into supervised learning pipelines for earlier detection and longitudinal monitoring in clinical domains. In retrieval graphs, adaptive incorporation of LLM-driven triple extraction and deeper fusion with downstream generative models are anticipated trajectories.

7. Theoretical and Practical Significance

CogniGraph represents a convergence of graph-based reasoning and hierarchical semantic structuring. In neurocognitive assessment, it operationalizes the quantification of narrative traversal without reliance on manual annotation or eye tracking, providing high-throughput biomarkers for cognitive-linguistic function. In agentic memory, its hierarchical, minimally entangled indexing supports scalable, temporally coherent, and semantically relevant retrieval, optimizing LLM agent interaction with persistent memory stores.

These results establish CogniGraph as a versatile, extensible framework for both clinical informatics and AI memory, with demonstrable empirical efficacy in discriminative and reasoning-heavy tasks. Further evolution in both domains is anticipated through the incorporation of LLM-backed extraction and retriever modules, and broader application to multimodal settings, provided the constraints and assumptions underlying the current frameworks are carefully addressed.

PDF Markdown Chat (Pro)

References (2)

Automated Extraction of Spatio-Semantic Graphs for Identifying Cognitive Impairment (2025)

LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning (2025)

Follow Topic

Get notified by email when new papers are published related to CogniGraph.