Deep Temporal Graph Clustering

Updated 13 April 2026

Deep Temporal Graph Clustering is an unsupervised method that identifies evolving communities in dynamic graphs by enforcing spatial and temporal coherence.
It integrates temporal encoders like recurrent, attention, and GNN modules to process time-ordered graph snapshots or event batches while maintaining causality.
It employs spectral loss, temporal smoothness, and clustering objectives to optimize performance across applications such as co-authorship networks, brain connectivity, and climate data.

Deep Temporal Graph Clustering (DTGC) formalizes the unsupervised node/community clustering problem for dynamic graphs, where interactions and structure evolve and must be discovered from temporal sequences of data. Unlike static graph clustering—which operates on a fixed adjacency matrix—DTGC processes time-ordered interactions or graph snapshots, maintains causality, and aims to simultaneously enforce spatial coherence and temporal smoothness. DTGC subsumes a broad methodological family, including temporal deep clustering, spectral and attention-based GNNs, contrastive objectives, batch-based and federated optimization, and is instantiated in a variety of frameworks and domains ranging from dynamic co-authorship graphs and brain connectivity to climate data and temporal knowledge graphs (Liu et al., 2023, Zhou et al., 2024, Nji et al., 16 Sep 2025, Liu et al., 19 Jan 2026, Chen et al., 2024).

1. Formal Problem Setting and Distinction from Static Graph Clustering

Let a temporal graph be a sequence of snapshots or an interaction stream: $\mathcal{G} = \{G_t = (V_t, E_t, X_t) \mid t=1,\ldots,T\}, \quad X_t \in \mathbb{R}^{|V_t| \times d}$ For each $t$ , the objective is to compute a soft (or hard) cluster assignment: $F_t \in \mathbb{R}^{|V_t| \times k}, \quad \sum_{c=1}^k [F_t]_{ic} = 1$ Unlike static clustering, which assumes a fixed $A \in \{0,1\}^{n \times n}$ , DTGC operates over batches extracted from a time-ordered event list $\mathcal{E} = \{(x, y, t)\}$ , removing the need for a global adjacency and enabling causal, memory-efficient processing (Liu et al., 2023, Liu et al., 19 Jan 2026). The loss functional commonly enforces both spatial graph coherence (within-cluster connectivity at each $t$ via spectral terms) and temporal smoothness of assignment evolution: $\min_{F_1,\dots,F_T} \; \sum_{t=1}^T \left( \mathrm{Tr}(F_t^\top L_t F_t) + \beta \|F_t - F_{t-1}\|_F^2 \right)$ where $L_t$ is the combinatorial Laplacian. This structure ensures that clusterings are robust to temporally local perturbations and avoid abrupt shifts (Zhou et al., 2024).

2. Architectural and Algorithmic Frameworks

2.1 Temporal Graph Encoder Design

DTGC frameworks leverage deep temporal encoders—including recurrent, attention, and GNN modules—which ingest either temporal graph snapshots or event batches. Representative encoder classes include:

Windowed Spatial–Temporal GNNs/Attention: Combine adjacency propagation across time lags. FTGC, for example, implements a multi-window aggregation,

$H_t = \sigma \left( A_t X_t W^{(0)} + \sum_{i=1}^k A_{t-i} X_{t-i} W^{(-i)} + \sum_{j=1}^k A_{t+j} X_{t+j} W^{(+j)} \right)$

or via temporal attention with $\alpha$ weights normalized over a history/future window (Zhou et al., 2024).

Batchwise Temporal Module: In TGC, node embeddings evolve following count-based or Hawkes-process-inspired updates, strictly maintaining chronological order and causality. The module supports flexible batch sizing, trading off time and memory (Liu et al., 2023, Liu et al., 19 Jan 2026).
Transformer or Bi-LSTM Bottlenecks: For spatiotemporal grid data, models such as B-TGAT first construct spatial graph representations per time step and then apply temporal attention via Bi-LSTM or transformer blocks (Nji et al., 16 Sep 2025).

2.2 Clustering Heads and Objectives

Clustering is realized either via direct softmax heads over GNN outputs (FTGC), distributional alignment (Student- $t$ 0 kernel and KL divergence (Liu et al., 2023, Liu et al., 19 Jan 2026, Nji et al., 16 Sep 2025)), batchwise adjacency reconstruction, or fuzzy c-means with explicit temporal alignment (DECRL (Chen et al., 2024)). The generalized form includes:

Spectral Loss: $t$ 1
Temporal Smoothness: $t$ 2 or cosine-similarity-based terms
Assignment Distribution Loss: Soft clustering assignments $t$ 3 with target-distribution sharpening $t$ 4, optimized via KL divergence.
Adjacency or Feature Reconstruction: Batch-level reconstruction enforcing local proximity.

In several architectures, clustering and embedding are trained end-to-end, with joint gradients flowing into both the encoder parameters and cluster centroids (Liu et al., 2023, Zhou et al., 2024, Nji et al., 16 Sep 2025, Liu et al., 19 Jan 2026).

3. Temporal Consistency, Alignment, and Evolution

To enforce smooth evolution and avoid cluster permutation or collapse, DTGC frameworks deploy:

Inter-timestep Alignment: Hungarian maximum bipartite matching aligns clusters at $t$ 5 and $t$ 6 (DECRL (Chen et al., 2024)), followed by fused representations:

$t$ 7

Temporal Regularization: Cosine-based regularizers penalize large angular deviations of cluster or entity representations between steps (Chen et al., 2024).
Cross-Batch Center Calibration: Terms such as $t$ 8 keep clusters consistent across batches (Liu et al., 19 Jan 2026).

This regularization captures both gradual community drift and abrupt regime shifts and supports regime-change detection (CGC (Park et al., 2022)).

4. Optimization Paradigms: Federated, Contrastive, and Ensemble

DTGC supports multiple optimization and aggregation contexts:

Federated Training: FTGC realizes decentralized GNN optimization across distributed subgraphs, aggregating local parameter updates via federated averaging while maintaining data privacy (Zhou et al., 2024).
Contrastive Learning: CGC employs multi-level InfoNCE-style contrastive losses to align node features, neighbor structure, and cluster prototypes, with an additional temporal contrastive term:

$t$ 9

This jointly optimizes semantic, structural, and temporal information (Park et al., 2022, Liu et al., 19 Jan 2026).

Hybrid and Ensemble Methods: Approaches like HEDGTC integrate multiple (homogeneous/heterogeneous) clustering outputs with dual consensus (co-occurrence and NMF) to build a robust affinity structure, subsequently refined via a deep graph autoencoder and recurrent network (Nji et al., 2024).

5. Evaluation Protocols, Datasets, and Metrics

Standard DTGC evaluations report clustering accuracy (ACC), normalized mutual information (NMI), adjusted Rand index (ARI), and F1 score, aligning cluster outputs with ground truth (where available) via optimal label permutations. Additional metrics include Silhouette, Davies–Bouldin, Calinski–Harabasz, and inter-cluster distances for unsupervised and spatiotemporal datasets (Liu et al., 2023, Liu et al., 19 Jan 2026, Nji et al., 16 Sep 2025, Nji et al., 2024).

Representative datasets include:

Node-level labeled temporal graphs: DBLP, Brain, Patent, School, BenchTGC Data4TGC, arXiv (BenchTGC subsplits).
Spatiotemporal grids: ERA5, CARRA, NCEP/NCAR for climate cluster regimes (Nji et al., 16 Sep 2025, Nji et al., 2024).
Temporal KGs: Seven benchmarks in DECRL (not explicitly listed in the data) (Chen et al., 2024).
Community streams: DBLP-T, Yahoo-Msg, Foursquare-NYC/TKY (Park et al., 2022).

Performance is benchmarked against static graph clustering (DeepWalk, node2vec, GAE, SDCN, DAEGC), temporal node embedding (HTNE, TGAT, TGN, JODIE, TREND), hybrid/ensemble baselines, and recent deep temporal clustering variants (Liu et al., 2023, Zhou et al., 2024, Liu et al., 19 Jan 2026, Nji et al., 2024, Nji et al., 16 Sep 2025).

6. Computational and Memory Complexities

DTGC achieves a favorable time-space balance. Static graph clustering (adjacency-matrix-based) incurs $F_t \in \mathbb{R}^{|V_t| \times k}, \quad \sum_{c=1}^k [F_t]_{ic} = 1$ 0 time and space; DTGC operates on batches or stream segments, with per-epoch cost $F_t \in \mathbb{R}^{|V_t| \times k}, \quad \sum_{c=1}^k [F_t]_{ic} = 1$ 1. Empirically, DTGC models operate at low memory budgets (e.g., TGC: 210 MB on $F_t \in \mathbb{R}^{|V_t| \times k}, \quad \sum_{c=1}^k [F_t]_{ic} = 1$ 2 vs. 6.9 GB for SDCN; BenchTGC: 2–9 GB for large arXiv graphs) (Liu et al., 2023, Liu et al., 19 Jan 2026). Batch size provides a controllable knob, balancing runtime and memory footprint.

7. Limitations, Open Problems, and Application Domains

Despite strong empirical performance and scalability (often outperforming centralized methods and maintaining performance with increasing client counts or on large graphs (Zhou et al., 2024, Liu et al., 19 Jan 2026)), DTGC research faces open challenges:

Lack of Large Labeled Temporal-Graph Clustering Benchmarks: Public datasets more often emphasize link prediction rather than node labels (Liu et al., 2023, Liu et al., 19 Jan 2026). BenchTGC offers nine such datasets, partially filling this gap.
Handling Unknown or Varying Cluster Counts: Most methods assume known $F_t \in \mathbb{R}^{|V_t| \times k}, \quad \sum_{c=1}^k [F_t]_{ic} = 1$ 3; adaptive or overlapping clustering in temporal contexts remains an open direction (Liu et al., 19 Jan 2026).
Global Structure and Spectral Regularization: Global objectives are difficult without a batchwise adjacency; batchwise proxies or new scalable regularizers are needed (Liu et al., 2023, Liu et al., 19 Jan 2026).
Real-Time/Streaming and Open-World Inference: On-the-fly updates without offline retraining (Liu et al., 19 Jan 2026, Park et al., 2022).
Domain Applications: Dynamic social and biological networks, online recommendation, transaction/anomaly detection, climate regime discovery, temporal character grouping in video, and hierarchical forecasting all benefit from DTGC methods (Nji et al., 16 Sep 2025, Zhou et al., 2024, Shu et al., 2023, Cini et al., 2023).

In summary, Deep Temporal Graph Clustering establishes a principled, scalable, and extensible learning paradigm for unsupervised node clustering in dynamic graphs, uniting windowed and attention-based temporal encoders, joint or batchwise clustering objectives, temporal alignment/regularization, and optimization strategies ranging from federated SGD to deep ensemble fusion. DTGC consistently outperforms static approaches when measured on large, time-evolving datasets and demonstrates adaptability to privacy constraints, streaming data, and evolving application domains (Liu et al., 2023, Zhou et al., 2024, Nji et al., 16 Sep 2025, Chen et al., 2024, Liu et al., 19 Jan 2026).