Time-Dependent Recursive Summary Graph

Updated 24 December 2025

TRSG is a dynamic framework that hierarchically summarizes evolving information streams using time-indexed graphs and LLM-driven clustering.
It employs multi-level graph abstractions with cosine similarity thresholds to detect changes and aggregate temporal trends.
TRSG facilitates actionable insights in news intelligence and strategic planning by providing interpretable, evidence-backed summaries.

A Time-Dependent Recursive Summary Graph (TRSG) is a data structure and computational framework designed for dynamic, temporally indexed summarization of evolving information streams, prominently exemplified in news intelligence and temporal retrieval-augmented generation. The TRSG paradigm enables systematized abstraction of raw, temporally distinguished document corpora through layered clustering and LLM-mediated summarization. It supports multi-resolution event and trend aggregation, fine-grained change detection, and interpretable, evidence-backed analytical workflows—integrating state-of-the-art semantic embedding, community detection, and prompt-driven LLM synthesis at scale (Kharlashkin et al., 17 Dec 2025, Han et al., 15 Oct 2025).

1. Mathematical Definition and Graph Structure

The formal construction of TRSGs varies with task setting but consistently leverages multi-level time-indexed graph abstractions.

Weekly Summarization (ORACLE): For each discrete time step $t$ (typically one week), TRSG instantiates a two-level undirected graph $G_t = (V_t, E_t)$ :

Level-1 nodes $V_t^{(1)}$ correspond to LLM-generated summaries $s_{t,i}^{(1)}$ of semantically coherent sub-clusters of filtered news documents.
Level-2 nodes $V_t^{(2)}$ summarize meta-clusters (theme groups) of Level-1 clusters, yielding strategic synthesized reports.
Edges $E_t = E_t^{(1)} \cup E_t^{(2)}$ connect node pairs in each level whose embedding cosine similarity exceeds thresholds ( $\tau_0=0.75$ for L1, $\tau_1=0.55$ for L2).

Hierarchical Temporal Graph (TG-RAG): The TRSG is defined as the tuple $(V, E, T, H, S)$ , where:

$V$ is the set of entity nodes.
$E \subseteq V \times V \times R \times T$ is the set of edges as (head, tail, relation, timestamp).
$T$ indexes time intervals (e.g., week, quarter, year).
$H \subseteq T \times T$ encodes the parent–child time hierarchy (e.g., year → quarter).
$S: T \to \text{Text}$ is a mapping from intervals to LLM-generated summaries.
The cross-layer incidence $C: T \to 2^E$ assigns to every time interval the facts/events with corresponding timestamps (Han et al., 15 Oct 2025).

2. Clustering and Recursive Summarization Algorithms

Item and Cluster Embedding: Each document $d$ is embedded, $x_d = \mathrm{EmbeddingModel}(d)$ , and semantic grouping is controlled by the cosine similarity $\mathrm{sim}(u, v) = \frac{u \cdot v}{\lVert u \rVert_2 \lVert v \rVert_2}$ .

Multi-level Clustering:

Sub-clustering (L0→L1): Construct document similarity graph $H_t=(D_t, E_D)$ with adjacency via $A_{ij} = 1$ if $\mathrm{sim}(x_i,x_j) \ge \tau_0$ , otherwise 0. Leiden community detection partitions $H_t$ into coherent sub-clusters $C_{t,i}^{(0)}$ maximized for modularity $Q$ :

$Q = \frac{1}{2m} \sum_{i,j}[A_{ij} - \frac{k_i k_j}{2m}]\,\delta(c_i, c_j)$

Meta-clustering (L1→L2): Each L1 summary is embedded ( $y_{t,i}^{(1)}$ ), and meta-clusters $M_{t,j}^{(1)}$ are obtained via a similarity threshold ( $\tau_1$ ) and Leiden partitioning; these groups are recursively summarized into high-level thematic syntheses.

Recursive Summarization: LLMs summarize at both cluster levels. If token constraints are exceeded, summaries are chunked, summarized recursively, and the results are aggregated.

TG-RAG Recursive Summarization: For each time node $\tau$ , summaries are computed recursively:

$S(\tau) = \mathrm{LLM\_Summarize}( \{\mathrm{desc}(\epsilon)\,|\,\epsilon \in C(\tau)\} \cup \{ S(\tau') \,|\, \tau' \in \text{children}(\tau) \} )$

This produces hierarchical, temporally continuous abstraction across granularities (Kharlashkin et al., 17 Dec 2025, Han et al., 15 Oct 2025).

3. Change Detection and Temporal Comparison

Lightweight, week-over-week change detection isolates novel, persistent, or modified themes:

For each summary $u \in V_t^{(\ell)}$ $u \in V_{t}^{(ℓ)}$ , find the $v \in V_{t-1}^{(\ell)}$ $v \in V_{t - 1}^{(ℓ)}$ maximizing similarity.
- Stable: $\mathrm{sim}(u, v) \ge 0.90$
- Changed: $0.70 \le \mathrm{sim}(u, v) < 0.90$
- Added: $\mathrm{sim}(u, v) < 0.70$
Summaries in $V_{t-1}^{(\ell)}$ without matches are Removed.
Added/removed elements are labeled by micro-label LLM outputs and agglomerative clustering of TF–IDF vectors, producing canonicalized theme groupings.

In the TG-RAG framework, identical entity–relation pairs at different times are distinguished as separate edges, and queries on $\Delta G_t = G_t \setminus G_{t-1}$ yield explicit temporal difference graphs (Kharlashkin et al., 17 Dec 2025, Han et al., 15 Oct 2025).

4. End-to-End System Pipeline

TRSG application, as in ORACLE, integrates the following pipeline components:

Data ingestion: Automated crawling (e.g., via RSS) with canonicalization and HTML snapshotting.
Versioning: Content hashes per URL enable efficient change tracking and re-embedding.
Relevance filtering: Multi-stage lexical/geographical/semantic filters using keyword and embedding similarity.
Embedding and storage: Use of pretrained models (OpenAI TextEmbedding-3) with metadata storage in Milvus.
Content classification: Supervised PESTEL (Political, Economic, Social, Technological, Environmental, Legal) labeling per item; cluster-level distributions are aggregated.
TRSG construction: Weekly L0→L1→L2 procedure with versioned snapshots.
Change detection and theme grouping, as previously described.
PESTEL-aware LLM analysis: For each theme/perspective, LLM generates titles, analyses, and importance scores, cached in SQL storage.
Front end: Visualization of $G_t$ , $\Delta G_t$ , theme cards, and actionable recommendations (Kharlashkin et al., 17 Dec 2025).

5. Evaluation Methodologies

Evaluation of TRSG systems focuses on:

Summary Quality: Human rating of faithfulness and completeness (Likert), ROUGE-L for n-gram overlap.
Graph Stability: Proportion of stable nodes $|Stable|/|V_t|$ across weeks, and average cluster drift as $1-\mathrm{sim}(u_{best}, v_{best})$ .
Decision-Readiness: Analyst surveys regarding utility (e.g., “Did the TRSG themes inform new curriculum actions?”) and time-to-insight metrics (latency from ingestion to actionable report) (Kharlashkin et al., 17 Dec 2025).

These metrics ensure TRSG output is both analytically useful and algorithmically robust.

6. Practical Applications and Use Cases

TRSGs are deployed in evidence-traceable foresight for institutional strategy. A prominent instantiation is curriculum intelligence within a Finnish university of applied sciences:

Analysts exploring Political+Technological PESTEL themes identify emergent meta-clusters (e.g., “EU Digital Skills Funding”, “Quantum Computing Policy Momentum”) as surfaced by $\Delta G_t$ .
L1 inspection links to specific programs (e.g., Digital Europe calls), funding allocations, and policy documents.
PESTEL-guided recommendations include curriculum realignment, new course module introduction, and partnerships with R&D labs, each traceable to original news sources for auditability.

Monthly and annual feedback cycles embed the TRSG apparatus within decision-making, supporting micro-credentials to degree redesign (Kharlashkin et al., 17 Dec 2025).

7. Comparative Perspectives and Versatility

TRSG instantiations vary:

ORACLE-TRSG utilizes two-level recursive LLM summarization of clustered contemporary news, emphasizing week-scale dynamics and stable, actionable reporting.
TG-RAG TRSG models evolving knowledge via bi-level temporally-explicit graphs with cross-granularity summary forests, supporting arbitrary time interval abstraction and dynamic subgraph retrieval during inference (Han et al., 15 Oct 2025).

A plausible implication is that the TRSG formalism is adaptable to a wide array of temporally evolving textual domains, supporting incremental updates and robust, interpretable summarization under continual data streams.

References:

"ORACLE: Time-Dependent Recursive Summary Graphs for Foresight on News Data Using LLMs" (Kharlashkin et al., 17 Dec 2025)
"RAG Meets Temporal Graphs: Time-Sensitive Modeling and Retrieval for Evolving Knowledge" (Han et al., 15 Oct 2025)