BenchTGC: Temporal Graph Clustering Benchmark
- BenchTGC is a standardized framework that formalizes temporal graph clustering using an interaction-sequence paradigm, promoting scalability and reproducibility.
- It decouples pre-processing, training, and clustering into modular stages, integrating classical and temporal-specific losses for effective node clustering.
- Empirical evaluations report significant gains in ARI, NMI, and ACC, demonstrating consistent improvements over traditional static graph methods.
BenchTGC is a standardized framework and benchmark for node clustering in temporal graphs, addressing the methodological and experimental fragmentation that has impeded advances in deep temporal graph clustering (TGC). It formalizes clustering on temporal graphs via an interaction-sequence paradigm, supplying both an extensible framework—BenchTGC Framework—and a suite of large, richly labeled datasets—Data4TGC—that together enable scalable, reproducible, and fair evaluation across the TGC task spectrum (Liu et al., 19 Jan 2026).
1. Temporal Graph Clustering: Problem Formulation and Paradigm
A temporal graph is formally defined as , where is the set of nodes, is a chronologically ordered sequence of interactions each with timestamp , and is the set of all timestamps. Unlike static graphs that use a fixed adjacency matrix , temporal graphs maintain event-level granularity, storing each interaction as a distinct event.
BenchTGC processes these temporal interactions using an interaction-sequence-based batch-processing pattern. The edge set is partitioned into chronologically ordered batches (), and in each training timestep only and any model memory from prior steps are available. This approach yields time and space complexity per epoch, in contrast to the memory consumption characteristic of static clustering methods. This distinction is critical for scalability and for modeling the inherent temporal dynamics observed in real-world scenarios (Liu et al., 19 Jan 2026).
2. Framework Architecture and Algorithmic Structure
The BenchTGC Framework consists of three decoupled stages: pre-processing, training, and clustering.
2.1 Pre-processing
Node features are either constructed or pre-trained as follows:
- Random initialization: is drawn from a uniform or Gaussian distribution and treated as a learnable parameter.
- One-hot ID embedding: with one at index .
- Positional encoding (sinusoidal):
- Feature pre-training: If raw node signals are unavailable, node embeddings are generated using static methods (e.g., node2vec) applied to a graph snapshot, yielding .
2.2 Training
Let denote any temporal GNN or Hawkes process-based architecture with parameters . For each batch :
- Model operates on , using and any memory state.
- Embeddings are computed.
- Baseline link-prediction loss:
- Classical and temporal-specific clustering losses are summed per batch:
- Feature reconstruction:
- Distribution alignment (Student’s t soft assignment):
- Contrastive calibration:
- Cross-batch calibration:
- Cluster scaling:
- Total loss per batch:
Parameters and cluster centers are updated via SGD or Adam. All coefficients are hyperparameters.
2.3 Clustering Step
In the two-step clustering employed by BenchTGC, after all batches, node embeddings are aggregated and standard K-means is run to assign clusters. End-to-end one-step TGC remains an open area for research.
Algorithmic Outline (Pseudocode Extract)
1 2 3 4 5 6 7 |
for batch in batches(E, B): Zb = F_theta(Eb, Xb) Compute all clustering and temporal losses L_total = aggregate all losses with lambda weights Update theta, C via optimizer Z_final = collect_all_embeddings() Clusters = KMeans(Z_final, K) |
3. Loss Modules and Adaptation of Classical Clustering to Temporal Setting
BenchTGC introduces adaptations of classical clustering loss components for temporal graphs:
- Feature Reconstruction aligns learned embeddings with original (possibly pre-trained or constructed) features, preserving node-level semantics.
- Distribution Alignment (KL loss) employs Student’s t-distributed soft assignments to minimize the dissimilarity between predicted embedding-cluster distributions and sharpened “target” distributions, promoting coherent cluster formation.
- Contrastive Calibration comprises node- and cluster-level contrastive objectives, enforcing intra-cluster similarity and inter-cluster separation directly in the learned representation.
- Cross-Batch Calibration enforces smooth evolution of cluster centers between successive interaction batches, stabilizing cluster assignment trajectories.
- Cluster Scaling penalizes both excessive cluster spread (dilation), excessive compactness (shrink), and overall L2 norm of cluster centers, encouraging well-formed, non-degenerate clusters.
All losses are aggregated per batch, yielding a composable, extensible optimization objective.
4. Temporal Batching and Scalability Characteristics
The batch-wise interaction-sequence processing in BenchTGC enables linear time/space scaling with respect to the number of events: per epoch. In contrast, static adjacency matrix-based graph clustering scales as , yielding out-of-memory errors beyond nodes in practice. By adjusting the batch size (ranging from to $10,000$), practitioners can systematically trade off GPU memory usage against per-epoch runtime. Empirical measurements indicate that, for arXiv-scale graphs, epoch runtimes drop from over 60 minutes at to under 5 minutes at , while memory use grows sublinearly from less than 0.5 GB to approximately 8 GB (Liu et al., 19 Jan 2026).
5. BenchTGC Datasets (Data4TGC): Design and Benchmark Scope
BenchTGC categorizes available datasets as follows:
| Category | Example Datasets | Clustering Feasibility |
|---|---|---|
| Unlabeled (no clustering feasible) | CollegeMsg, LastFM, MOOC, Wikipedia, etc. | No ground-truth clusters |
| Clustering-unavailable | Meta, Bitcoin, ML1M, Amazon, Yelp, Tmall | Labels uninformative |
| Clustering-available (small, public) | Brain (K=10), Patent (K=6), DBLP (K=10) | Feasible (small-scale) |
To address the absence of large, richly labeled benchmarks, BenchTGC introduces six datasets created for scalable, realistic TGC evaluation:
- School: , , interaction classes.
- arXivAI: , , fields.
- arXivCS: , , subfields.
- arXivMath: , , areas.
- arXivPhy: , , topics.
- arXivLarge: , , classes.
These are aggregated as Data4TGC, alongside the three smaller public labeled datasets, forming nine benchmarks in total.
6. Experimental Evaluation and Key Findings
BenchTGC evaluations span classical (K-means, AE, DeepWalk, node2vec, GAE), static graph clustering (DAEGC, MVGRL, SDCN(+Q), DFCN, SCGC), temporal graph learning (HTNE, JODIE, MNCI, TREND, S2T), and BenchTGC-improved versions (IHTNE, IJODIE, IMNCI, ITREND, IS2T). Clustering quality is assessed by ACC, NMI, ARI, and F1-score.
Major findings include:
- Static graph clustering methods frequently exhaust GPU memory on .
- Incorporating BenchTGC clustering modules into temporal models yields a 99% improvement rate, defined as outperformance versus unmodified baselines nearly every time.
- On arXivCS, ARI increases from 24.65% (DeepWalk) to 35.04% (IHTNE), a relative gain of +42%.
- On the School benchmark, IJODIE and ITREND methods both achieve perfect (100%) ACC, NMI, ARI, and F1.
- Visualizations (t-SNE, similarity heatmaps) affirm markedly enhanced cluster separation for BenchTGC-enhanced models relative to baselines (Liu et al., 19 Jan 2026).
7. Limitations and Future Directions
BenchTGC achieves state-of-the-art clustering across a range of realistic temporal graphs, often obtaining two- to five-fold gains in ARI/NMI over unmodified temporal graph models. Open challenges remain, notably out-of-memory in open-world settings, determining the number of clusters automatically, and the development of end-to-end “one-step” temporal clustering architectures. BenchTGC protocol design, composable loss modules, and the Data4TGC benchmark suite constitute substantive advances in scalable temporal graph clustering methodology.
All code, dataset pre-processing scripts, and experimental logs are available at https://github.com/MGitHubL/BenchTGC and https://github.com/MGitHubL/Data4TGC (Liu et al., 19 Jan 2026).