BenchTGC: Temporal Graph Clustering Benchmark

Updated 27 January 2026

BenchTGC is a standardized framework that formalizes temporal graph clustering using an interaction-sequence paradigm, promoting scalability and reproducibility.
It decouples pre-processing, training, and clustering into modular stages, integrating classical and temporal-specific losses for effective node clustering.
Empirical evaluations report significant gains in ARI, NMI, and ACC, demonstrating consistent improvements over traditional static graph methods.

BenchTGC is a standardized framework and benchmark for node clustering in temporal graphs, addressing the methodological and experimental fragmentation that has impeded advances in deep temporal graph clustering (TGC). It formalizes clustering on temporal graphs via an interaction-sequence paradigm, supplying both an extensible framework—BenchTGC Framework—and a suite of large, richly labeled datasets—Data4TGC—that together enable scalable, reproducible, and fair evaluation across the TGC task spectrum (Liu et al., 19 Jan 2026).

1. Temporal Graph Clustering: Problem Formulation and Paradigm

A temporal graph is formally defined as $G=(V,E,T)$ , where $V$ is the set of nodes, $E=\{(u,v,t)\}$ is a chronologically ordered sequence of interactions each with timestamp $t\in T$ , and $T$ is the set of all timestamps. Unlike static graphs that use a fixed adjacency matrix $A\in\{0,1\}^{N\times N}$ , temporal graphs maintain event-level granularity, storing each interaction as a distinct event.

BenchTGC processes these temporal interactions using an interaction-sequence-based batch-processing pattern. The edge set $E$ is partitioned into $B$ chronologically ordered batches ( $E=E^1\cup E^2\cup\ldots\cup E^B$ ), and in each training timestep $b$ only $E^b$ and any model memory from prior steps are available. This approach yields time and space complexity $O(|E|)$ per epoch, in contrast to the $O(N^2)$ memory consumption characteristic of static clustering methods. This distinction is critical for scalability and for modeling the inherent temporal dynamics observed in real-world scenarios (Liu et al., 19 Jan 2026).

2. Framework Architecture and Algorithmic Structure

The BenchTGC Framework consists of three decoupled stages: pre-processing, training, and clustering.

2.1 Pre-processing

Node features $X$ are either constructed or pre-trained as follows:

Random initialization: $x_i$ is drawn from a uniform or Gaussian distribution and treated as a learnable parameter.
One-hot ID embedding: $x_i = e_i \in \{0,1\}^{N}$ with one at index $i$ .
Positional encoding (sinusoidal):

$x_i[j] = \begin{cases} \sin(i/10000^{2j/d}) & \text{if } j \text{ is even} \ \cos(i/10000^{2j/d}) & \text{if } j \text{ is odd} \ \end{cases}$

Feature pre-training: If raw node signals are unavailable, node embeddings $Z^0$ are generated using static methods (e.g., node2vec) applied to a graph snapshot, yielding $Z^0=F^P(G)\in\mathbb{R}^{N\times d}$ .

2.2 Training

Let $F_\theta$ denote any temporal GNN or Hawkes process-based architecture with parameters $\theta$ . For each batch $b$ :

Model operates on $E^b$ , using $X^b\subseteq Z^0$ and any memory state.
Embeddings $Z^b = F_\theta(E^b, X^b)$ are computed.
Baseline link-prediction loss:

$L_\text{model}^b = -\sum_{(i,j)\in\text{Pos}} \log \sigma(z_i\cdot z_j) - \sum_{(i,n)\in\text{Neg}} \log \sigma(-z_i\cdot z_n)$

Classical and temporal-specific clustering losses are summed per batch:
- Feature reconstruction: $L_X = \sum_{i\in V_b}\|z_i - x_i\|_2^2$
- Distribution alignment (Student’s t soft assignment):
$Q_{ik} = \frac{(1+\|z_i-c_k\|^2)^{-1/2}}{\sum_{k'}(1+\|z_i-c_{k'}\|^2)^{-1/2}}$

$P_{ik} = \frac{Q_{ik}^2/\sum_i Q_{ik}}{\sum_{k'} Q_{ik'}^2/\sum_i Q_{ik'}}$

$L_D = \mathrm{KL}(P\|Q) = \sum_{i,k} P_{ik}\log(P_{ik}/Q_{ik})$ - Contrastive calibration:

$L_C = L_\text{node} + L_\text{cluster}$

$L_\text{node} = -\log \frac{\exp(\mathrm{sim}(z_i,z_i)/\tau)}{\exp(\mathrm{sim}(z_i,z_i)/\tau) + \sum_{n\ne i}\exp(\mathrm{sim}(z_i,z_n)/\tau)}$

$L_\text{cluster} = -\log \frac{1}{K} \sum_k \frac{ \exp(\cos(c_k, c_k)/\tau)}{\sum_{k'} \exp(\cos(c_k, c_{k'})/\tau)}$ - Cross-batch calibration:

$L_B = \|\mathbf{C}^b - \mathbf{C}^{b-1}\|_F^2$ - Cluster scaling:

$L_S = L_\text{dilation} + L_\text{shrink} + \|\mathbf{C}\|_F^2$

$L_\text{dilation} = -\frac{1}{K(K-1)}\sum_{k\ne k'} \|c_k - c_{k'}\|_2^2$

$L_\text{shrink} = \frac{1}{K}\sum_{i,k} \|z_i - c_k\|_2^2$
Total loss per batch:

$L^b = L_\text{model}^b + \lambda_X L_X + \lambda_D L_D + \lambda_C L_C + \lambda_B L_B + \lambda_S L_S$

Parameters $\theta$ and cluster centers $\{c_k\}$ are updated via SGD or Adam. All $\lambda$ coefficients are hyperparameters.

2.3 Clustering Step

In the two-step clustering employed by BenchTGC, after all batches, node embeddings $Z_\text{final}$ are aggregated and standard K-means is run to assign $K$ clusters. End-to-end one-step TGC remains an open area for research.

Algorithmic Outline (Pseudocode Extract)

for batch in batches(E, B):
    Zb = F_theta(Eb, Xb)
    Compute all clustering and temporal losses
    L_total = aggregate all losses with lambda weights
    Update theta, C via optimizer
Z_final = collect_all_embeddings()
Clusters = KMeans(Z_final, K)

3. Loss Modules and Adaptation of Classical Clustering to Temporal Setting

BenchTGC introduces adaptations of classical clustering loss components for temporal graphs:

Feature Reconstruction aligns learned embeddings with original (possibly pre-trained or constructed) features, preserving node-level semantics.
Distribution Alignment (KL loss) employs Student’s t-distributed soft assignments to minimize the dissimilarity between predicted embedding-cluster distributions and sharpened “target” distributions, promoting coherent cluster formation.
Contrastive Calibration comprises node- and cluster-level contrastive objectives, enforcing intra-cluster similarity and inter-cluster separation directly in the learned representation.
Cross-Batch Calibration enforces smooth evolution of cluster centers between successive interaction batches, stabilizing cluster assignment trajectories.
Cluster Scaling penalizes both excessive cluster spread (dilation), excessive compactness (shrink), and overall L2 norm of cluster centers, encouraging well-formed, non-degenerate clusters.

All losses are aggregated per batch, yielding a composable, extensible optimization objective.

4. Temporal Batching and Scalability Characteristics

The batch-wise interaction-sequence processing in BenchTGC enables linear time/space scaling with respect to the number of events: $O(|E|)$ per epoch. In contrast, static adjacency matrix-based graph clustering scales as $O(N^2)$ , yielding out-of-memory errors beyond $N\approx75,000$ nodes in practice. By adjusting the batch size $B$ (ranging from $B=128$ to $10,000$), practitioners can systematically trade off GPU memory usage against per-epoch runtime. Empirical measurements indicate that, for arXiv-scale graphs, epoch runtimes drop from over 60 minutes at $B=1$ to under 5 minutes at $B=10,000$ , while memory use grows sublinearly from less than 0.5 GB to approximately 8 GB (Liu et al., 19 Jan 2026).

5. BenchTGC Datasets (Data4TGC): Design and Benchmark Scope

BenchTGC categorizes available datasets as follows:

Category	Example Datasets	Clustering Feasibility
Unlabeled (no clustering feasible)	CollegeMsg, LastFM, MOOC, Wikipedia, etc.	No ground-truth clusters
Clustering-unavailable	Meta, Bitcoin, ML1M, Amazon, Yelp, Tmall	Labels uninformative
Clustering-available (small, public)	Brain (K=10), Patent (K=6), DBLP (K=10)	Feasible (small-scale)

To address the absence of large, richly labeled benchmarks, BenchTGC introduces six datasets created for scalable, realistic TGC evaluation:

School: $N=327$ , $E=188\,000$ , $K=9$ interaction classes.
arXivAI: $N=69,854$ , $E=699,206$ , $K=5$ fields.
arXivCS: $N=169,343$ , $E=1,170,000$ , $K=40$ subfields.
arXivMath: $N=270,013$ , $E=800,000$ , $K=31$ areas.
arXivPhy: $N=837,212$ , $E=11,700,000$ , $K=53$ topics.
arXivLarge: $N=1,320,000$ , $E=13,700,000$ , $K=172$ classes.

These are aggregated as Data4TGC, alongside the three smaller public labeled datasets, forming nine benchmarks in total.

6. Experimental Evaluation and Key Findings

BenchTGC evaluations span classical (K-means, AE, DeepWalk, node2vec, GAE), static graph clustering (DAEGC, MVGRL, SDCN(+Q), DFCN, SCGC), temporal graph learning (HTNE, JODIE, MNCI, TREND, S2T), and BenchTGC-improved versions (IHTNE, IJODIE, IMNCI, ITREND, IS2T). Clustering quality is assessed by ACC, NMI, ARI, and F1-score.

Major findings include:

Static graph clustering methods frequently exhaust GPU memory on $N>75,000$ .
Incorporating BenchTGC clustering modules into temporal models yields a 99% improvement rate, defined as outperformance versus unmodified baselines nearly every time.
On arXivCS, ARI increases from 24.65% (DeepWalk) to 35.04% (IHTNE), a relative gain of +42%.
On the School benchmark, IJODIE and ITREND methods both achieve perfect (100%) ACC, NMI, ARI, and F1.
Visualizations (t-SNE, similarity heatmaps) affirm markedly enhanced cluster separation for BenchTGC-enhanced models relative to baselines (Liu et al., 19 Jan 2026).

7. Limitations and Future Directions

BenchTGC achieves state-of-the-art clustering across a range of realistic temporal graphs, often obtaining two- to five-fold gains in ARI/NMI over unmodified temporal graph models. Open challenges remain, notably out-of-memory in open-world settings, determining the number of clusters $K$ automatically, and the development of end-to-end “one-step” temporal clustering architectures. BenchTGC protocol design, composable loss modules, and the Data4TGC benchmark suite constitute substantive advances in scalable temporal graph clustering methodology.

All code, dataset pre-processing scripts, and experimental logs are available at https://github.com/MGitHubL/BenchTGC and https://github.com/MGitHubL/Data4TGC (Liu et al., 19 Jan 2026).

Markdown Upgrade to Chat

References (1)

Deep Temporal Graph Clustering: A Comprehensive Benchmark and Datasets (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BenchTGC Framework.

BenchTGC: Temporal Graph Clustering Benchmark

1. Temporal Graph Clustering: Problem Formulation and Paradigm

2. Framework Architecture and Algorithmic Structure

2.1 Pre-processing

2.2 Training

2.3 Clustering Step

Algorithmic Outline (Pseudocode Extract)

3. Loss Modules and Adaptation of Classical Clustering to Temporal Setting

4. Temporal Batching and Scalability Characteristics

5. BenchTGC Datasets (Data4TGC): Design and Benchmark Scope

6. Experimental Evaluation and Key Findings

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

BenchTGC: Temporal Graph Clustering Benchmark

1. Temporal Graph Clustering: Problem Formulation and Paradigm

2. Framework Architecture and Algorithmic Structure

2.1 Pre-processing

2.2 Training

2.3 Clustering Step

Algorithmic Outline (Pseudocode Extract)

3. Loss Modules and Adaptation of Classical Clustering to Temporal Setting

4. Temporal Batching and Scalability Characteristics

5. BenchTGC Datasets (Data4TGC): Design and Benchmark Scope

6. Experimental Evaluation and Key Findings

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research