GraphInst Benchmark: Graph Inference Evaluation
- GraphInst Benchmark is a comprehensive suite designed to evaluate graph inference and graph-theoretic algorithms using both real-world and synthetic datasets.
- It standardizes assessments with curated datasets, pre-extracted features, ground-truth labels, and precise scoring metrics for clustering, classification, and signal denoising tasks.
- Its modular, hardware-aware design enables detailed comparisons of algorithmic kernels and performance metrics, guiding robust and reproducible analysis.
GraphInst Benchmark is a publicly available suite designed to rigorously evaluate graph inference and graph-theoretic algorithms on a range of real-world tasks and large-scale synthetic scenarios. It incorporates both application-driven graph topology inference benchmarking for machine learning and algorithmic benchmarking for hardware and runtime characterization. The suite provides datasets, kernels, evaluation metrics, and reference implementations that facilitate comprehensive quantitative comparison across graph learning, signal processing, clustering, classification, and hardware-centric graph workloads (Lassance et al., 2020, Yoo et al., 2010).
1. Design Objectives and Motivation
GraphInst was developed to address significant gaps in existing benchmarking practices for graph algorithms and graph inference methods. Prior efforts either fixated on a narrow set of kernels (such as BFS), failed to provide ground-truth labels or realistic graph statistics, or introduced non-graph-specific computation that masked critical traversal and memory-access behaviors. GraphInst’s core objectives are:
- Comprehensiveness: Coverage of canonical graph workloads, spanning unsupervised clustering, semi-supervised classification, graph signal denoising for inference, and five distinct kernel classes for algorithmic evaluation (search, spectral, adjacency-centric, metric, global optimization).
- Task and Hardware Realism: Datasets are drawn from image, audio, document, and traffic signals embedding real-world variability; synthetic graph generation uses the Barabási–Albert preferential attachment model to induce scale-free topologies and realistic degree distributions.
- Evaluation Transparency: Easy-to-use datasets with pre-extracted features and ground-truth labels, standardized scoring formulas (AMI, classification accuracy, SNR/MSE), and open-source code underpin apples-to-apples algorithm comparisons and architectural benchmarking.
2. Dataset Composition and Downstream Tasks
GraphInst aggregates diverse datasets, each aligned with specific downstream graph tasks. Feature extraction is standardized, eliminating variability from pipeline differences.
| Dataset | Size / Structure | Features / Ratio | Downstream Task(s) |
|---|---|---|---|
| flowers102 | N=1,020; 102 classes | 2,048-dim (F/N≈2) | Image clustering, classification |
| ESC-50 | N=2,000; 50 classes | 1,024-dim (F/N=0.51) | Audio clustering, classification |
| cora | N=2,708; 7 classes | 1,433-dim BoW (F/N=0.53) | Document clustering, classification |
| toronto traffic | F=2,202 nodes, N=1 signal | Signal only | Graph signal denoising |
Each dataset supports tasks framed as graph inference problems:
- UCV (Unsupervised Clustering of Vertices): Graph construction for spectral clustering, scored by Adjusted Mutual Information against ground-truth classes.
- SSCV (Semi-Supervised Classification of Vertices): Graph-based label propagation and simplified graph convolution (SGC), scored by prediction accuracy under 5% labeled/95% test node regime (averaged over 100 splits).
- DGS (Signal Denoising on Graphs): Graph learning from signal statistics, followed by Simoncelli low-pass graph filtering, evaluated by SNR improvement and MSE.
3. Graph Inference and Kernel Methodologies
GraphInst accommodates both graph inference mechanisms and graph-theoretic kernel routines for detailed analysis.
Graph Topology Inference (Tasks 1–3)
- Naive k-NN + Similarity: Cosine, covariance, and RBF kernel; k-NN sparsification (k grid: 5–1,000), with normalization or .
- Non-negative Kernel Regression (NNK) [Shekkizhar & Ortega 2019]: Nonnegative least-squares fit over k-nearest neighbors, enforcing orthogonal residuals; hyperparameters: k, .
- Smoothness Prior (Kalofolias 2016, 2018): Convex optimization subject to weight and degree constraints; promotes signal smoothness over learned graph.
Graph-Theoretic Algorithmic Kernels (Editor’s term)
- Single-source Shortest Path (Kernel 1): Dijkstra/BFS-style search with priority queue relaxation, .
- Spectral Power-Method (Kernel 2): Dominant eigenvector convergence under adjacency/Laplacian, .
- Hierarchicalization (Kernel 3): Random vertex coalescence into supernodes, adjacency rewiring.
- Metric Sampling (Kernel 4): Local clustering coefficient estimation via random neighborhood sampling.
- Community Splitting (Kernel 5): Entropy-based greedy partition minimization.
Graph generator is Barabási–Albert, time, memory ; yields scale-free graphs with tunable average degree .
4. Evaluation Protocols and Metrics
Evaluation employs standardized, task-specific quantitative metrics:
| Task | Metric | Formula / Procedure |
|---|---|---|
| UCV | Adjusted Mutual Information | |
| SSCV | Classification Accuracy | averaged over 100 splits |
| DGS | Signal Reconstruction/SNR | ; |
Signal denoising by Simoncelli low-pass filtering employs a piecewise spectral multiplier:
- for , for , for ; .
Algorithmic kernel evaluation includes:
- Throughput (edges/nodes/sec)
- Latency (convergence time)
- Scalability (core-count dependency)
- Hardware counters (memory bandwidth, cache-miss rates)
5. Empirical Findings and Benchmark Results
GraphInst provides comparative quantitative results for both graph inference and algorithmic kernels:
Clustering (AMI, Task 1):
- Spectral + naive approach (k-NN/cosine) matches or slightly outperforms NNK and Kalofolias on several datasets.
- Kalofolias degrades on binary BoW features (cora); naive cosine k-NN is robust.
| Method | flowers102 | ESC-50 | cora |
|---|---|---|---|
| C-means (no graph) | 0.36 | 0.59 | 0.10 |
| Spectral + Naive | 0.45 | 0.66 | 0.34 |
| Spectral + NNK | 0.44 | 0.66 | 0.34 |
| Spectral + Kalof. | 0.44 | 0.65 | 0.27 |
Vertex Classification (Accuracy %):
- Any graph improves prediction over pure logistic regression.
- Kalofolias yields best label propagation on ESC-50 and flowers102; NNK competitive.
| Model/Inference | flowers102 | ESC-50 | cora |
|---|---|---|---|
| Logistic regression | 33.5 ± 1.7 | 52.9 ± 1.9 | 46.8 ± 1.6 |
| Label Propagation, Naive | 36.7 ± 1.6 | 59.1 ± 1.8 | 58.9 ± 2.9 |
| SGC (2-hop), Naive | 37.7 ± 1.5 | 60.5 ± 2.0 | 67.2 ± 1.5 |
Signal Denoising (Best SNR):
- Smoothness-prior graph (Kalofolias) slightly outperforms true road map; naive and NNK methods achieve 10 dB SNR.
| Graph Source | Best SNR (dB) |
|---|---|
| True map | 10.32 |
| Kalofolias-learned | 10.41 |
| Naive RBF+kNN | 9.80 |
| RBF+NNK | 9.99 |
Algorithmic Kernel Findings:
- Pointer-based adjacency incurs 3× L3 cache miss rate versus CSR.
- Kernel suite reliably predicts hardware impacts within 10–15% against full-scale BFS, PageRank, community detection implementations (Yoo et al., 2010).
6. Practical Guidance and Implementation
GraphInst is available with C++ reference code, OpenMP and MPI parallelizations, and modular graph representation (adjacency-list/CSR switch via header file). Users can test kernels under worst-case (pointer) or best-case (streaming) locality, examine memory subsystem effects, and quantitatively compare inference algorithms on standardized tasks and datasets. Active node sampling or combination with graph inference is advised for sparse semi-supervised labeling challenges. Data characteristics (binary features, feature ratio F/N) substantially affect algorithm robustness and performance.
A plausible implication is that baseline naive methods remain highly competitive for clustering and classification on heterogeneous real datasets, while state-of-the-art smoothness or sparsity priors confer advantage for denoising and label propagation when properly hyperparameter-tuned.
7. Significance and Extensions
GraphInst establishes the first unified framework for graph inference benchmarking across tasks and datasets, and for hardware-centric evaluation of graph algorithms. Its extensible design admits new inference methods and kernels, directly supporting comparative studies, scalability analysis, and robust algorithm selection tailored to application or system constraints. For both graph ML practitioners and architecture researchers, GraphInst enables reproducible, high-fidelity performance and generalization comparison (Lassance et al., 2020, Yoo et al., 2010).