BenchTGC Datasets: Temporal Graph Benchmark
- BenchTGC Datasets are a comprehensive benchmark suite for temporal graph clustering, offering organized, continuous-time interaction data from diverse domains.
- It provides standardized datasets with timestamped edges, ground-truth cluster labels, and open-source data loaders for reproducible research.
- The suite covers domains including academic citations, neuroscience, patents, and social contacts, enabling both static and online clustering evaluations.
BenchTGC Datasets define a large-scale, multi-domain benchmark suite specifically curated and formatted for foundational research in temporal graph clustering (TGC), evaluating clustering algorithms on graphs with temporally evolving edge sets. The BenchTGC suite (also called Data4TGC in the associated literature) consists of nine datasets derived from citation, neuroscience, patent, social contact, and large arXiv subdomains, each formalized as a continuous-time interaction graph without static adjacency pre-processing. BenchTGC systematically covers real-world dynamic phenomena, provides ground-truth cluster labelings for rigorous external evaluation, and supports both static snapshot and online temporal clustering regimes (Liu et al., 19 Jan 2026).
1. Dataset Inventory and Domains
BenchTGC comprises nine datasets, each represented as an ordered stream of timestamped interactions:
- DBLP: Academic citation graph (1975–2001, annual timesteps, |V| = 28,085, |E| = 236,894).
- Brain: Human brain functional connectivity (sliding-window fMRI, 12 epochs, |V| = 5,000, |E| = 1,955,488).
- Patent: NBER patent citations (891 days, |V| = 12,214, |E| = 41,916).
- School: SocioPatterns high-school contact network (5 days, 20s windows, |V| = 327, |E| = 188,508).
- arXivAI, arXivCS, arXivMath, arXivPhy, arXivLarge: Paper citation graphs from ogbn-papers100M, filtered on arXiv subject classes (spans up to 41 years, highest |V| = 1,324,064, |E| = 13,701,428, up to K = 172 ground-truth clusters).
Summary statistics include node and edge counts, average degree (e.g., avg deg = 2·|static edges|/|V|), min/max node interaction counts, and explicit time resolution per dataset. All graphs are released in standardized formats (edges.csv: src_id, dst_id, timestamp; nodes.csv: node_id, label; features.npz for pre-computed node embeddings), under an MIT license (Liu et al., 19 Jan 2026).
2. Temporal Graph Formalism and Structure
Each dataset is modeled as a continuous-time temporal graph,
where:
- is the node set (fixed per dataset).
- is a multiset of timestamped interactions , sorted by .
- is the collection of unique timestamps.
- gives the timestamp of a particular interaction. There are no static adjacency matrices or pre-computed snapshots; paradigm-appropriate temporal graph models must operate directly on batches , either defined by fixed batch size or by temporal window size (Liu et al., 19 Jan 2026).
3. Feature Construction and Node Representations
Original raw datasets include no node attributes. BenchTGC supports several feature generation protocols:
- Random initialization: Learnable -dimensional parameter vectors per node, used as placeholders for algorithm compatibility.
- One-hot node identifiers: , used for small graphs.
- Sinusoidal positional encoding: For node ,
yielding .
- Feature pre-training: Run node2vec or DeepWalk on the static counterpart of to compute -dimensional embeddings (with standardized for all releases). Edges carry only timestamp metadata; no additional edge attributes are provided (Liu et al., 19 Jan 2026).
4. Ground-Truth Cluster Structure
Each node is assigned a canonical ground-truth label—enabling external validation of clustering algorithms—determined as follows:
- DBLP: 10 research areas (venue classification).
- Brain: 10 anatomical/functional regions.
- Patent: 6 NBER patent classes.
- School: 9 school classes.
- arXiv*: arXiv subject category/subfield code, depends on dataset (e.g., in arXivAI, in arXivCS, etc.). The actual number of clusters is dataset-specific and matches the count of unique labels; cluster-size imbalance varies (see main text, Section 3 and Figure 1 of (Liu et al., 19 Jan 2026)).
5. BenchTGC Data Format, Loading, and Usage
All datasets follow a uniform schema:
- edges.csv: Rows (src_id, dst_id, timestamp), sorted by time.
- nodes.csv: node_id, ground-truth label (integer encoding).
- features.npz: Optional precomputed node2vec or positional encoding matrix.
Example loading snippet in PyTorch/DGL:
1 2 3 4 5 6 7 8 9 |
import torch from torch.utils.data import DataLoader from benchtgc.dataset import TemporalEdgeDataset ds = TemporalEdgeDataset("path/to/edges.csv", num_nodes=N) loader = DataLoader(ds, batch_size=1024, shuffle=False) for batch in loader: src, dst, ts = batch # torch.LongTensor # feed into temporal GNN… |
6. Supported Clustering Paradigms and Evaluation Metrics
BenchTGC is compatible with two key clustering paradigms:
- Static snapshot clustering: Aggregate all interactions up to time , forming an adjacency matrix , applying static clustering algorithms at each .
- Continuous temporal clustering: Consume interaction batches sequentially, incrementally update the node representation , and periodically perform -means or equivalent clustering in latent space.
Standard external validation metrics are:
- Accuracy (ACC): Fraction of correctly assigned labels after optimal cluster-to-label matching.
- Normalized Mutual Information (NMI):
- Adjusted Rand Index (ARI): As introduced in Yeung & Ruzzo (2001).
- Macro F1: Mean classwise F1-score. Reported results are averages over five randomized experiment runs, enabling robust evaluation (Liu et al., 19 Jan 2026).
7. Implications and Benchmarking Scope
BenchTGC addresses previous deficits in temporal clustering research by providing:
- Large, realistic dynamic graphs—spanning citation, neural, patent, and high-resolution social domains—suitable for multi-cluster and class-imbalanced scenarios.
- An evaluation foundation for both offline (static) and online (interactive) TGC methods.
- Consistent task definitions, metadata, and utility code, supporting rapid prototyping and reproducible benchmarking.
- Strict reliance on external-label metrics with clearly documented splits and parameterizations, ensuring comparability across methods.
All aspects of data construction, preprocessing, and evaluation are designed to foreground scalability, semantic transparency, and methodological neutrality, directly enabling domain-agnostic advances in temporal graph clustering (Liu et al., 19 Jan 2026).