Papers
Topics
Authors
Recent
Search
2000 character limit reached

BenchTGC: Temporal Graph Clustering Benchmark

Updated 27 January 2026
  • BenchTGC is a standardized framework that formalizes temporal graph clustering using an interaction-sequence paradigm, promoting scalability and reproducibility.
  • It decouples pre-processing, training, and clustering into modular stages, integrating classical and temporal-specific losses for effective node clustering.
  • Empirical evaluations report significant gains in ARI, NMI, and ACC, demonstrating consistent improvements over traditional static graph methods.

BenchTGC is a standardized framework and benchmark for node clustering in temporal graphs, addressing the methodological and experimental fragmentation that has impeded advances in deep temporal graph clustering (TGC). It formalizes clustering on temporal graphs via an interaction-sequence paradigm, supplying both an extensible framework—BenchTGC Framework—and a suite of large, richly labeled datasets—Data4TGC—that together enable scalable, reproducible, and fair evaluation across the TGC task spectrum (Liu et al., 19 Jan 2026).

1. Temporal Graph Clustering: Problem Formulation and Paradigm

A temporal graph is formally defined as G=(V,E,T)G=(V,E,T), where VV is the set of nodes, E={(u,v,t)}E=\{(u,v,t)\} is a chronologically ordered sequence of interactions each with timestamp tTt\in T, and TT is the set of all timestamps. Unlike static graphs that use a fixed adjacency matrix A{0,1}N×NA\in\{0,1\}^{N\times N}, temporal graphs maintain event-level granularity, storing each interaction as a distinct event.

BenchTGC processes these temporal interactions using an interaction-sequence-based batch-processing pattern. The edge set EE is partitioned into BB chronologically ordered batches (E=E1E2EBE=E^1\cup E^2\cup\ldots\cup E^B), and in each training timestep bb only EbE^b and any model memory from prior steps are available. This approach yields time and space complexity O(E)O(|E|) per epoch, in contrast to the O(N2)O(N^2) memory consumption characteristic of static clustering methods. This distinction is critical for scalability and for modeling the inherent temporal dynamics observed in real-world scenarios (Liu et al., 19 Jan 2026).

2. Framework Architecture and Algorithmic Structure

The BenchTGC Framework consists of three decoupled stages: pre-processing, training, and clustering.

2.1 Pre-processing

Node features XX are either constructed or pre-trained as follows:

  • Random initialization: xix_i is drawn from a uniform or Gaussian distribution and treated as a learnable parameter.
  • One-hot ID embedding: xi=ei{0,1}Nx_i = e_i \in \{0,1\}^{N} with one at index ii.
  • Positional encoding (sinusoidal):

xi[j]={sin(i/100002j/d)if j is even cos(i/100002j/d)if j is odd x_i[j] = \begin{cases} \sin(i/10000^{2j/d}) & \text{if } j \text{ is even} \ \cos(i/10000^{2j/d}) & \text{if } j \text{ is odd} \ \end{cases}

  • Feature pre-training: If raw node signals are unavailable, node embeddings Z0Z^0 are generated using static methods (e.g., node2vec) applied to a graph snapshot, yielding Z0=FP(G)RN×dZ^0=F^P(G)\in\mathbb{R}^{N\times d}.

2.2 Training

Let FθF_\theta denote any temporal GNN or Hawkes process-based architecture with parameters θ\theta. For each batch bb:

  • Model operates on EbE^b, using XbZ0X^b\subseteq Z^0 and any memory state.
  • Embeddings Zb=Fθ(Eb,Xb)Z^b = F_\theta(E^b, X^b) are computed.
  • Baseline link-prediction loss:

Lmodelb=(i,j)Poslogσ(zizj)(i,n)Neglogσ(zizn)L_\text{model}^b = -\sum_{(i,j)\in\text{Pos}} \log \sigma(z_i\cdot z_j) - \sum_{(i,n)\in\text{Neg}} \log \sigma(-z_i\cdot z_n)

  • Classical and temporal-specific clustering losses are summed per batch:

    • Feature reconstruction: LX=iVbzixi22L_X = \sum_{i\in V_b}\|z_i - x_i\|_2^2
    • Distribution alignment (Student’s t soft assignment):

    Qik=(1+zick2)1/2k(1+zick2)1/2Q_{ik} = \frac{(1+\|z_i-c_k\|^2)^{-1/2}}{\sum_{k'}(1+\|z_i-c_{k'}\|^2)^{-1/2}}

    Pik=Qik2/iQikkQik2/iQikP_{ik} = \frac{Q_{ik}^2/\sum_i Q_{ik}}{\sum_{k'} Q_{ik'}^2/\sum_i Q_{ik'}}

    LD=KL(PQ)=i,kPiklog(Pik/Qik)L_D = \mathrm{KL}(P\|Q) = \sum_{i,k} P_{ik}\log(P_{ik}/Q_{ik}) - Contrastive calibration:

    LC=Lnode+LclusterL_C = L_\text{node} + L_\text{cluster}

    Lnode=logexp(sim(zi,zi)/τ)exp(sim(zi,zi)/τ)+niexp(sim(zi,zn)/τ)L_\text{node} = -\log \frac{\exp(\mathrm{sim}(z_i,z_i)/\tau)}{\exp(\mathrm{sim}(z_i,z_i)/\tau) + \sum_{n\ne i}\exp(\mathrm{sim}(z_i,z_n)/\tau)}

    Lcluster=log1Kkexp(cos(ck,ck)/τ)kexp(cos(ck,ck)/τ)L_\text{cluster} = -\log \frac{1}{K} \sum_k \frac{ \exp(\cos(c_k, c_k)/\tau)}{\sum_{k'} \exp(\cos(c_k, c_{k'})/\tau)} - Cross-batch calibration:

    LB=CbCb1F2L_B = \|\mathbf{C}^b - \mathbf{C}^{b-1}\|_F^2 - Cluster scaling:

    LS=Ldilation+Lshrink+CF2L_S = L_\text{dilation} + L_\text{shrink} + \|\mathbf{C}\|_F^2

    Ldilation=1K(K1)kkckck22L_\text{dilation} = -\frac{1}{K(K-1)}\sum_{k\ne k'} \|c_k - c_{k'}\|_2^2

    Lshrink=1Ki,kzick22L_\text{shrink} = \frac{1}{K}\sum_{i,k} \|z_i - c_k\|_2^2

  • Total loss per batch:

Lb=Lmodelb+λXLX+λDLD+λCLC+λBLB+λSLSL^b = L_\text{model}^b + \lambda_X L_X + \lambda_D L_D + \lambda_C L_C + \lambda_B L_B + \lambda_S L_S

Parameters θ\theta and cluster centers {ck}\{c_k\} are updated via SGD or Adam. All λ\lambda coefficients are hyperparameters.

2.3 Clustering Step

In the two-step clustering employed by BenchTGC, after all batches, node embeddings ZfinalZ_\text{final} are aggregated and standard K-means is run to assign KK clusters. End-to-end one-step TGC remains an open area for research.

Algorithmic Outline (Pseudocode Extract)

1
2
3
4
5
6
7
for batch in batches(E, B):
    Zb = F_theta(Eb, Xb)
    Compute all clustering and temporal losses
    L_total = aggregate all losses with lambda weights
    Update theta, C via optimizer
Z_final = collect_all_embeddings()
Clusters = KMeans(Z_final, K)

3. Loss Modules and Adaptation of Classical Clustering to Temporal Setting

BenchTGC introduces adaptations of classical clustering loss components for temporal graphs:

  • Feature Reconstruction aligns learned embeddings with original (possibly pre-trained or constructed) features, preserving node-level semantics.
  • Distribution Alignment (KL loss) employs Student’s t-distributed soft assignments to minimize the dissimilarity between predicted embedding-cluster distributions and sharpened “target” distributions, promoting coherent cluster formation.
  • Contrastive Calibration comprises node- and cluster-level contrastive objectives, enforcing intra-cluster similarity and inter-cluster separation directly in the learned representation.
  • Cross-Batch Calibration enforces smooth evolution of cluster centers between successive interaction batches, stabilizing cluster assignment trajectories.
  • Cluster Scaling penalizes both excessive cluster spread (dilation), excessive compactness (shrink), and overall L2 norm of cluster centers, encouraging well-formed, non-degenerate clusters.

All losses are aggregated per batch, yielding a composable, extensible optimization objective.

4. Temporal Batching and Scalability Characteristics

The batch-wise interaction-sequence processing in BenchTGC enables linear time/space scaling with respect to the number of events: O(E)O(|E|) per epoch. In contrast, static adjacency matrix-based graph clustering scales as O(N2)O(N^2), yielding out-of-memory errors beyond N75,000N\approx75,000 nodes in practice. By adjusting the batch size BB (ranging from B=128B=128 to $10,000$), practitioners can systematically trade off GPU memory usage against per-epoch runtime. Empirical measurements indicate that, for arXiv-scale graphs, epoch runtimes drop from over 60 minutes at B=1B=1 to under 5 minutes at B=10,000B=10,000, while memory use grows sublinearly from less than 0.5 GB to approximately 8 GB (Liu et al., 19 Jan 2026).

5. BenchTGC Datasets (Data4TGC): Design and Benchmark Scope

BenchTGC categorizes available datasets as follows:

Category Example Datasets Clustering Feasibility
Unlabeled (no clustering feasible) CollegeMsg, LastFM, MOOC, Wikipedia, etc. No ground-truth clusters
Clustering-unavailable Meta, Bitcoin, ML1M, Amazon, Yelp, Tmall Labels uninformative
Clustering-available (small, public) Brain (K=10), Patent (K=6), DBLP (K=10) Feasible (small-scale)

To address the absence of large, richly labeled benchmarks, BenchTGC introduces six datasets created for scalable, realistic TGC evaluation:

  • School: N=327N=327, E=188000E=188\,000, K=9K=9 interaction classes.
  • arXivAI: N=69,854N=69,854, E=699,206E=699,206, K=5K=5 fields.
  • arXivCS: N=169,343N=169,343, E=1,170,000E=1,170,000, K=40K=40 subfields.
  • arXivMath: N=270,013N=270,013, E=800,000E=800,000, K=31K=31 areas.
  • arXivPhy: N=837,212N=837,212, E=11,700,000E=11,700,000, K=53K=53 topics.
  • arXivLarge: N=1,320,000N=1,320,000, E=13,700,000E=13,700,000, K=172K=172 classes.

These are aggregated as Data4TGC, alongside the three smaller public labeled datasets, forming nine benchmarks in total.

6. Experimental Evaluation and Key Findings

BenchTGC evaluations span classical (K-means, AE, DeepWalk, node2vec, GAE), static graph clustering (DAEGC, MVGRL, SDCN(+Q), DFCN, SCGC), temporal graph learning (HTNE, JODIE, MNCI, TREND, S2T), and BenchTGC-improved versions (IHTNE, IJODIE, IMNCI, ITREND, IS2T). Clustering quality is assessed by ACC, NMI, ARI, and F1-score.

Major findings include:

  • Static graph clustering methods frequently exhaust GPU memory on N>75,000N>75,000.
  • Incorporating BenchTGC clustering modules into temporal models yields a 99% improvement rate, defined as outperformance versus unmodified baselines nearly every time.
  • On arXivCS, ARI increases from 24.65% (DeepWalk) to 35.04% (IHTNE), a relative gain of +42%.
  • On the School benchmark, IJODIE and ITREND methods both achieve perfect (100%) ACC, NMI, ARI, and F1.
  • Visualizations (t-SNE, similarity heatmaps) affirm markedly enhanced cluster separation for BenchTGC-enhanced models relative to baselines (Liu et al., 19 Jan 2026).

7. Limitations and Future Directions

BenchTGC achieves state-of-the-art clustering across a range of realistic temporal graphs, often obtaining two- to five-fold gains in ARI/NMI over unmodified temporal graph models. Open challenges remain, notably out-of-memory in open-world settings, determining the number of clusters KK automatically, and the development of end-to-end “one-step” temporal clustering architectures. BenchTGC protocol design, composable loss modules, and the Data4TGC benchmark suite constitute substantive advances in scalable temporal graph clustering methodology.

All code, dataset pre-processing scripts, and experimental logs are available at https://github.com/MGitHubL/BenchTGC and https://github.com/MGitHubL/Data4TGC (Liu et al., 19 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BenchTGC Framework.