Papers
Topics
Authors
Recent
2000 character limit reached

Time-Dependent Recursive Summary Graph

Updated 24 December 2025
  • TRSG is a dynamic framework that hierarchically summarizes evolving information streams using time-indexed graphs and LLM-driven clustering.
  • It employs multi-level graph abstractions with cosine similarity thresholds to detect changes and aggregate temporal trends.
  • TRSG facilitates actionable insights in news intelligence and strategic planning by providing interpretable, evidence-backed summaries.

A Time-Dependent Recursive Summary Graph (TRSG) is a data structure and computational framework designed for dynamic, temporally indexed summarization of evolving information streams, prominently exemplified in news intelligence and temporal retrieval-augmented generation. The TRSG paradigm enables systematized abstraction of raw, temporally distinguished document corpora through layered clustering and LLM-mediated summarization. It supports multi-resolution event and trend aggregation, fine-grained change detection, and interpretable, evidence-backed analytical workflows—integrating state-of-the-art semantic embedding, community detection, and prompt-driven LLM synthesis at scale (Kharlashkin et al., 17 Dec 2025, Han et al., 15 Oct 2025).

1. Mathematical Definition and Graph Structure

The formal construction of TRSGs varies with task setting but consistently leverages multi-level time-indexed graph abstractions.

Weekly Summarization (ORACLE): For each discrete time step tt (typically one week), TRSG instantiates a two-level undirected graph Gt=(Vt,Et)G_t = (V_t, E_t):

  • Level-1 nodes Vt(1)V_t^{(1)} correspond to LLM-generated summaries st,i(1)s_{t,i}^{(1)} of semantically coherent sub-clusters of filtered news documents.
  • Level-2 nodes Vt(2)V_t^{(2)} summarize meta-clusters (theme groups) of Level-1 clusters, yielding strategic synthesized reports.
  • Edges Et=Et(1)Et(2)E_t = E_t^{(1)} \cup E_t^{(2)} connect node pairs in each level whose embedding cosine similarity exceeds thresholds (τ0=0.75\tau_0=0.75 for L1, τ1=0.55\tau_1=0.55 for L2).

Hierarchical Temporal Graph (TG-RAG): The TRSG is defined as the tuple (V,E,T,H,S)(V, E, T, H, S), where:

  • VV is the set of entity nodes.
  • EV×V×R×TE \subseteq V \times V \times R \times T is the set of edges as (head, tail, relation, timestamp).
  • TT indexes time intervals (e.g., week, quarter, year).
  • HT×TH \subseteq T \times T encodes the parent–child time hierarchy (e.g., year → quarter).
  • S:TTextS: T \to \text{Text} is a mapping from intervals to LLM-generated summaries.
  • The cross-layer incidence C:T2EC: T \to 2^E assigns to every time interval the facts/events with corresponding timestamps (Han et al., 15 Oct 2025).

2. Clustering and Recursive Summarization Algorithms

Item and Cluster Embedding: Each document dd is embedded, xd=EmbeddingModel(d)x_d = \mathrm{EmbeddingModel}(d), and semantic grouping is controlled by the cosine similarity sim(u,v)=uvu2v2\mathrm{sim}(u, v) = \frac{u \cdot v}{\lVert u \rVert_2 \lVert v \rVert_2}.

Multi-level Clustering:

  • Sub-clustering (L0→L1): Construct document similarity graph Ht=(Dt,ED)H_t=(D_t, E_D) with adjacency via Aij=1A_{ij} = 1 if sim(xi,xj)τ0\mathrm{sim}(x_i,x_j) \ge \tau_0, otherwise 0. Leiden community detection partitions HtH_t into coherent sub-clusters Ct,i(0)C_{t,i}^{(0)} maximized for modularity QQ:

Q=12mi,j[Aijkikj2m]δ(ci,cj)Q = \frac{1}{2m} \sum_{i,j}[A_{ij} - \frac{k_i k_j}{2m}]\,\delta(c_i, c_j)

  • Meta-clustering (L1→L2): Each L1 summary is embedded (yt,i(1)y_{t,i}^{(1)}), and meta-clusters Mt,j(1)M_{t,j}^{(1)} are obtained via a similarity threshold (τ1\tau_1) and Leiden partitioning; these groups are recursively summarized into high-level thematic syntheses.

Recursive Summarization: LLMs summarize at both cluster levels. If token constraints are exceeded, summaries are chunked, summarized recursively, and the results are aggregated.

TG-RAG Recursive Summarization: For each time node τ\tau, summaries are computed recursively:

S(τ)=LLM_Summarize({desc(ϵ)ϵC(τ)}{S(τ)τchildren(τ)})S(\tau) = \mathrm{LLM\_Summarize}( \{\mathrm{desc}(\epsilon)\,|\,\epsilon \in C(\tau)\} \cup \{ S(\tau') \,|\, \tau' \in \text{children}(\tau) \} )

This produces hierarchical, temporally continuous abstraction across granularities (Kharlashkin et al., 17 Dec 2025, Han et al., 15 Oct 2025).

3. Change Detection and Temporal Comparison

Lightweight, week-over-week change detection isolates novel, persistent, or modified themes:

  • For each summary uVt()u \in V_t^{(\ell)}, find the vVt1()v \in V_{t-1}^{(\ell)} maximizing similarity.
    • Stable: sim(u,v)0.90\mathrm{sim}(u, v) \ge 0.90
    • Changed: 0.70sim(u,v)<0.900.70 \le \mathrm{sim}(u, v) < 0.90
    • Added: sim(u,v)<0.70\mathrm{sim}(u, v) < 0.70
  • Summaries in Vt1()V_{t-1}^{(\ell)} without matches are Removed.
  • Added/removed elements are labeled by micro-label LLM outputs and agglomerative clustering of TF–IDF vectors, producing canonicalized theme groupings.

In the TG-RAG framework, identical entity–relation pairs at different times are distinguished as separate edges, and queries on ΔGt=GtGt1\Delta G_t = G_t \setminus G_{t-1} yield explicit temporal difference graphs (Kharlashkin et al., 17 Dec 2025, Han et al., 15 Oct 2025).

4. End-to-End System Pipeline

TRSG application, as in ORACLE, integrates the following pipeline components:

  1. Data ingestion: Automated crawling (e.g., via RSS) with canonicalization and HTML snapshotting.
  2. Versioning: Content hashes per URL enable efficient change tracking and re-embedding.
  3. Relevance filtering: Multi-stage lexical/geographical/semantic filters using keyword and embedding similarity.
  4. Embedding and storage: Use of pretrained models (OpenAI TextEmbedding-3) with metadata storage in Milvus.
  5. Content classification: Supervised PESTEL (Political, Economic, Social, Technological, Environmental, Legal) labeling per item; cluster-level distributions are aggregated.
  6. TRSG construction: Weekly L0→L1→L2 procedure with versioned snapshots.
  7. Change detection and theme grouping, as previously described.
  8. PESTEL-aware LLM analysis: For each theme/perspective, LLM generates titles, analyses, and importance scores, cached in SQL storage.
  9. Front end: Visualization of GtG_t, ΔGt\Delta G_t, theme cards, and actionable recommendations (Kharlashkin et al., 17 Dec 2025).

5. Evaluation Methodologies

Evaluation of TRSG systems focuses on:

  • Summary Quality: Human rating of faithfulness and completeness (Likert), ROUGE-L for n-gram overlap.
  • Graph Stability: Proportion of stable nodes Stable/Vt|Stable|/|V_t| across weeks, and average cluster drift as 1sim(ubest,vbest)1-\mathrm{sim}(u_{best}, v_{best}).
  • Decision-Readiness: Analyst surveys regarding utility (e.g., “Did the TRSG themes inform new curriculum actions?”) and time-to-insight metrics (latency from ingestion to actionable report) (Kharlashkin et al., 17 Dec 2025).

These metrics ensure TRSG output is both analytically useful and algorithmically robust.

6. Practical Applications and Use Cases

TRSGs are deployed in evidence-traceable foresight for institutional strategy. A prominent instantiation is curriculum intelligence within a Finnish university of applied sciences:

  • Analysts exploring Political+Technological PESTEL themes identify emergent meta-clusters (e.g., “EU Digital Skills Funding”, “Quantum Computing Policy Momentum”) as surfaced by ΔGt\Delta G_t.
  • L1 inspection links to specific programs (e.g., Digital Europe calls), funding allocations, and policy documents.
  • PESTEL-guided recommendations include curriculum realignment, new course module introduction, and partnerships with R&D labs, each traceable to original news sources for auditability.

Monthly and annual feedback cycles embed the TRSG apparatus within decision-making, supporting micro-credentials to degree redesign (Kharlashkin et al., 17 Dec 2025).

7. Comparative Perspectives and Versatility

TRSG instantiations vary:

  • ORACLE-TRSG utilizes two-level recursive LLM summarization of clustered contemporary news, emphasizing week-scale dynamics and stable, actionable reporting.
  • TG-RAG TRSG models evolving knowledge via bi-level temporally-explicit graphs with cross-granularity summary forests, supporting arbitrary time interval abstraction and dynamic subgraph retrieval during inference (Han et al., 15 Oct 2025).

A plausible implication is that the TRSG formalism is adaptable to a wide array of temporally evolving textual domains, supporting incremental updates and robust, interpretable summarization under continual data streams.


References:

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Time-Dependent Recursive Summary Graph (TRSG).