GraphInst Benchmark: Graph Inference Evaluation

Updated 29 January 2026

GraphInst Benchmark is a comprehensive suite designed to evaluate graph inference and graph-theoretic algorithms using both real-world and synthetic datasets.
It standardizes assessments with curated datasets, pre-extracted features, ground-truth labels, and precise scoring metrics for clustering, classification, and signal denoising tasks.
Its modular, hardware-aware design enables detailed comparisons of algorithmic kernels and performance metrics, guiding robust and reproducible analysis.

GraphInst Benchmark is a publicly available suite designed to rigorously evaluate graph inference and graph-theoretic algorithms on a range of real-world tasks and large-scale synthetic scenarios. It incorporates both application-driven graph topology inference benchmarking for machine learning and algorithmic benchmarking for hardware and runtime characterization. The suite provides datasets, kernels, evaluation metrics, and reference implementations that facilitate comprehensive quantitative comparison across graph learning, signal processing, clustering, classification, and hardware-centric graph workloads (Lassance et al., 2020, Yoo et al., 2010).

1. Design Objectives and Motivation

GraphInst was developed to address significant gaps in existing benchmarking practices for graph algorithms and graph inference methods. Prior efforts either fixated on a narrow set of kernels (such as BFS), failed to provide ground-truth labels or realistic graph statistics, or introduced non-graph-specific computation that masked critical traversal and memory-access behaviors. GraphInst’s core objectives are:

Comprehensiveness: Coverage of canonical graph workloads, spanning unsupervised clustering, semi-supervised classification, graph signal denoising for inference, and five distinct kernel classes for algorithmic evaluation (search, spectral, adjacency-centric, metric, global optimization).
Task and Hardware Realism: Datasets are drawn from image, audio, document, and traffic signals embedding real-world variability; synthetic graph generation uses the Barabási–Albert preferential attachment model to induce scale-free topologies and realistic degree distributions.
Evaluation Transparency: Easy-to-use datasets with pre-extracted features and ground-truth labels, standardized scoring formulas (AMI, classification accuracy, SNR/MSE), and open-source code underpin apples-to-apples algorithm comparisons and architectural benchmarking.

2. Dataset Composition and Downstream Tasks

GraphInst aggregates diverse datasets, each aligned with specific downstream graph tasks. Feature extraction is standardized, eliminating variability from pipeline differences.

Dataset	Size / Structure	Features / Ratio	Downstream Task(s)
flowers102	N=1,020; 102 classes	2,048-dim (F/N≈2)	Image clustering, classification
ESC-50	N=2,000; 50 classes	1,024-dim (F/N=0.51)	Audio clustering, classification
cora	N=2,708; 7 classes	1,433-dim BoW (F/N=0.53)	Document clustering, classification
toronto traffic	F=2,202 nodes, N=1 signal	Signal only	Graph signal denoising

Each dataset supports tasks framed as graph inference problems:

UCV (Unsupervised Clustering of Vertices): Graph construction for spectral clustering, scored by Adjusted Mutual Information against ground-truth classes.
SSCV (Semi-Supervised Classification of Vertices): Graph-based label propagation and simplified graph convolution (SGC), scored by prediction accuracy under 5% labeled/95% test node regime (averaged over 100 splits).
DGS (Signal Denoising on Graphs): Graph learning from signal statistics, followed by Simoncelli low-pass graph filtering, evaluated by SNR improvement and MSE.

3. Graph Inference and Kernel Methodologies

GraphInst accommodates both graph inference mechanisms and graph-theoretic kernel routines for detailed analysis.

Graph Topology Inference (Tasks 1–3)

Naive k-NN + Similarity: Cosine, covariance, and RBF kernel; k-NN sparsification (k grid: 5–1,000), with normalization $\hat{W} = D^{-1/2}WD^{-1/2}$ or $\hat{W} \leftarrow D^{-1/2}(I+W)D^{-1/2}$ .
Non-negative Kernel Regression (NNK) [Shekkizhar & Ortega 2019]: Nonnegative least-squares fit over k-nearest neighbors, enforcing orthogonal residuals; hyperparameters: k, $\sigma_{\min}=10^{-4}$ .
Smoothness Prior (Kalofolias 2016, 2018): Convex optimization $\min_{W\succeq 0} \mathrm{Tr}(X^\top LX) + \alpha\|W\|_F^2$ subject to weight and degree constraints; promotes signal smoothness over learned graph.

Graph-Theoretic Algorithmic Kernels (Editor’s term)

Single-source Shortest Path (Kernel 1): Dijkstra/BFS-style search with priority queue relaxation, $O((n+m)\log n)$ .
Spectral Power-Method (Kernel 2): Dominant eigenvector convergence under adjacency/Laplacian, $O(mt_{\max})$ .
Hierarchicalization (Kernel 3): Random vertex coalescence into supernodes, adjacency rewiring.
Metric Sampling (Kernel 4): Local clustering coefficient estimation via random neighborhood sampling.
Community Splitting (Kernel 5): Entropy-based greedy partition minimization.

Graph generator is Barabási–Albert, $O(m)$ time, memory $O(n+m)$ ; yields scale-free graphs with tunable average degree $\bar d$ .

4. Evaluation Protocols and Metrics

Evaluation employs standardized, task-specific quantitative metrics:

Task	Metric	Formula / Procedure
UCV	Adjusted Mutual Information	$\mathrm{AMI}(U, V)=\frac{\mathrm{MI}(U,V) - \mathbb{E}[\mathrm{MI}(U,V)]}{\max\{H(U),H(V)\}-\mathbb{E}[\mathrm{MI}(U,V)]}$
SSCV	Classification Accuracy	$\text{Accuracy} = \frac{1}{\|V_{\text{test}}\|}\sum_{i\in V_{\text{test}}} 1\{\hat{y}_i = y_i\}$ averaged over 100 splits
DGS	Signal Reconstruction/SNR	$\text{MSE} = \frac{1}{F}\lVert x_{\text{true}} - \hat{x}\rVert_2^2$ ; $\text{SNR}=10\log_{10}\left(\frac{\lVert x_{\text{true}}\rVert_2^2}{\lVert x_{\text{true}}-\hat{x}\rVert_2^2}\right)$

Signal denoising by Simoncelli low-pass filtering employs a piecewise spectral multiplier:

$f(\lambda_l) = 1$ for $\lambda_l \leq \tau/2$ , $f(\lambda_l) = \cos\left[\frac{\pi}{2} \frac{\log \lambda_l}{\log 2}\right]$ for $\tau/2 < \lambda_l \leq \tau$ , $f(\lambda_l) = 0$ for $\lambda_l > \tau$ ; $\tau\in[0,1]$ .

Algorithmic kernel evaluation includes:

Throughput (edges/nodes/sec)
Latency (convergence time)
Scalability (core-count dependency)
Hardware counters (memory bandwidth, cache-miss rates)

5. Empirical Findings and Benchmark Results

GraphInst provides comparative quantitative results for both graph inference and algorithmic kernels:

Clustering (AMI, Task 1):

Spectral + naive approach (k-NN/cosine) matches or slightly outperforms NNK and Kalofolias on several datasets.
Kalofolias degrades on binary BoW features (cora); naive cosine k-NN is robust.

Method	flowers102	ESC-50	cora
C-means (no graph)	0.36	0.59	0.10
Spectral + Naive	0.45	0.66	0.34
Spectral + NNK	0.44	0.66	0.34
Spectral + Kalof.	0.44	0.65	0.27

Vertex Classification (Accuracy %):

Any graph improves prediction over pure logistic regression.
Kalofolias yields best label propagation on ESC-50 and flowers102; NNK competitive.

Model/Inference	flowers102	ESC-50	cora
Logistic regression	33.5 ± 1.7	52.9 ± 1.9	46.8 ± 1.6
Label Propagation, Naive	36.7 ± 1.6	59.1 ± 1.8	58.9 ± 2.9
SGC (2-hop), Naive	37.7 ± 1.5	60.5 ± 2.0	67.2 ± 1.5

Signal Denoising (Best SNR):

Smoothness-prior graph (Kalofolias) slightly outperforms true road map; naive and NNK methods achieve $\sim$ 10 dB SNR.

Graph Source	Best SNR (dB)
True map	10.32
Kalofolias-learned	10.41
Naive RBF+kNN	9.80
RBF+NNK	9.99

Algorithmic Kernel Findings:

Pointer-based adjacency incurs $\sim$ 3× L3 cache miss rate versus CSR.
Kernel suite reliably predicts hardware impacts within 10–15% against full-scale BFS, PageRank, community detection implementations (Yoo et al., 2010).

6. Practical Guidance and Implementation

GraphInst is available with C++ reference code, OpenMP and MPI parallelizations, and modular graph representation (adjacency-list/CSR switch via header file). Users can test kernels under worst-case (pointer) or best-case (streaming) locality, examine memory subsystem effects, and quantitatively compare inference algorithms on standardized tasks and datasets. Active node sampling or combination with graph inference is advised for sparse semi-supervised labeling challenges. Data characteristics (binary features, feature ratio F/N) substantially affect algorithm robustness and performance.

A plausible implication is that baseline naive methods remain highly competitive for clustering and classification on heterogeneous real datasets, while state-of-the-art smoothness or sparsity priors confer advantage for denoising and label propagation when properly hyperparameter-tuned.

7. Significance and Extensions

GraphInst establishes the first unified framework for graph inference benchmarking across tasks and datasets, and for hardware-centric evaluation of graph algorithms. Its extensible design admits new inference methods and kernels, directly supporting comparative studies, scalability analysis, and robust algorithm selection tailored to application or system constraints. For both graph ML practitioners and architecture researchers, GraphInst enables reproducible, high-fidelity performance and generalization comparison (Lassance et al., 2020, Yoo et al., 2010).

Markdown Report Issue Upgrade to Chat

References (2)

Graph topology inference benchmarks for machine learning (2020)

A New Benchmark For Evaluation Of Graph-Theoretic Algorithms (2010)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GraphInst Benchmark.