IntelliGraph Benchmark Dataset

Updated 18 August 2025

IntelliGraph Benchmark Dataset is a suite of datasets and evaluation protocols designed for inferring graph topology from raw observations, targeting tasks like clustering, classification, and denoising.
It leverages both naive k-NN approaches and advanced methods such as NNK and smoothness-based techniques to construct and optimize graph structures for downstream performance.
The dataset encompasses diverse modalities—including images, audio, text, and traffic—to ensure comprehensive, real-world validation of graph inference algorithms.

The IntelliGraph Benchmark Dataset is a suite of publicly released datasets and evaluation protocols developed to provide rigorous and comparable benchmarks for graph topology inference methods. Unlike conventional graph learning datasets that focus on node or edge property prediction, IntelliGraph centers on scenarios where graphs are not readily available and must be inferred from raw observations. It directly targets the validation of graph inference algorithms within downstream tasks, revealing strengths and limitations that may be masked in isolated graph recovery evaluations.

1. Benchmark Design Principles

IntelliGraph is structured around three distinct downstream tasks, each requiring the inference of an appropriate graph for optimal performance:

Unsupervised Clustering of Vertices (UCV): Observations are embedded as nodes, a graph is inferred from features, and spectral clustering is applied to assign class labels.
Semi-Supervised Classification of Vertices (SSCV): A subset of nodes is labeled; labels are propagated or combined via the inferred graph utilizing mechanisms such as label propagation and Simplified Graph Convolution (SGC).
Denoising of Graph Signals (DGS): A graph is constructed over the dimensions/features of a single observation (e.g., traffic data), supporting graph-based filtering techniques to suppress noise.

Each benchmark encapsulates a dataset suited to the respective task (images, audio, text, traffic), a graph inference procedure, and standardized evaluation metrics. Prepackaged datasets and unified scripts ensure reproducibility and quantitative side-by-side comparisons across inference methods.

2. Graph Inference Methodologies

A central feature of IntelliGraph is its explicit contrast between baseline and advanced graph inference methods:

Naive Baselines: Typically involve computation of pairwise similarities or distances (e.g., cosine similarity, covariance, RBF kernel) and construction of k-nearest neighbor (k-NN) graphs, followed by symmetrization and normalization steps such as degree-normalized adjacency:

$W_{\text{norm}} = D^{-1/2} W D^{-1/2}$

NNK (Non-Negative Kernel Regression) Graphs: These graphs are derived by enforcing non-negative kernel regression constraints given user-specified parameters: maximum degree per vertex ( $k$ ) and minimum edge weight ( $\sigma$ ). NNK graphs are sparser and have demonstrated competitive performance, particularly in spectral clustering and classification settings.
Smoothness-Based Method (Kalofolias): Assumes signals are smooth over the underlying graph. The algorithm optimizes for a topology that balances smoothness of signals with connectivity sparsity, employing tunable parameters for mean degree and thresholds.

For the denoising task, the benchmark incorporates a Simoncelli low-pass filter with spectral response:

$f_l = \begin{cases} 1, & \lambda_l \leq \frac{\tau}{2} \ \cos\left( \frac{\pi}{2} \frac{\log(\lambda_l)}{\log(2)} \right), & \frac{\tau}{2} < \lambda_l \leq \tau \ 0, & \lambda_l > \tau \end{cases}$

with $\tau \in [0, 1]$ , $\lambda_l$ as the $l$ th Laplacian eigenvalue ( $L = F \Lambda F^T$ ).

3. Evaluation Metrics and Protocols

Evaluation metrics are selected to directly quantify the suitability of the inferred graph for its downstream task:

Task	Metric	Purpose
UCV (clustering)	Adjusted Mutual Information (AMI)	Compares clustering to ground truth, adjusts for chance
SSCV (classification)	Classification Accuracy	Measures correct label assignment for unlabeled nodes, with statistics over multiple splits
DGS (denoising)	Signal-to-Noise Ratio (SNR)	Assesses restoration of original signal via graph-based denoising

Appropriate metrics ensure that graph inference is evaluated in terms of downstream utility, rather than solely on structural fidelity.

4. Dataset Diversity and Structure

The scope of datasets included in IntelliGraph is designed to stress test graph inference methods under heterogeneous data conditions:

Image Dataset: “102 Category Flower Dataset” with 1020 samples, 102 classes, and high-dimensional feature vectors ( $F=2048$ from Inceptionv3 embeddings).
Audio Dataset: ESC-50, comprising 2000 clips from 50 classes, using $F=1024$ features from AudioSet network embeddings.
Text Dataset: Cora, with 2708 documents, 7 categories, and sparse binary Bag-of-Words features ( $F=1433$ ).
Traffic (DGS) Dataset: Toronto traffic network, represented as $F=2202$ nodes with a single observation—graph construction occurs over features.

This diversity facilitates evaluation of graph inference robustness across modalities, sample sizes, and feature distributions.

5. Practical Applications and Observed Outcomes

IntelliGraph is engineered to reflect practical application scenarios where graphs are not inherently available:

Clustering and Classification: Vital in social network analysis, citation graphs, and domains where latent relational structure drives grouping and label propagation efficiency.
Signal Denoising: Critical for traffic forecasting and sensor networks, where graph-based filtering enhances signal integrity.

Empirical benchmark results consistently reveal that superior graph inference yields improved downstream performance (higher AMI, increased classification accuracy, better SNR) and, as a plausible implication, suggests that optimal graph construction is sensitive to both the underlying data and target application.

6. Limitations and Prospects

Several documented limitations frame the current and future research direction:

Task-Specificity: IntelliGraph addresses clustering, classification, and denoising. Other domains such as brain connectivity are excluded due to negligible performance improvements observed in preliminary tests.
Parameter Sensitivity: Outcomes depend on selected similarity measures, the $k$ parameter (number of neighbors), and other hyperparameters. Sparse graphs may exhibit disconnected components, degrading label propagation performance in semi-supervised settings.
Semi-Supervised Variability: Elevated standard deviations in accuracy across splits indicate potential instability, motivating investigation into robust graph sampling and labeled-set selection strategies.

Future work is stated to include expansion to new signal types and domains, integration of advanced graph sampling and adaptive parameter selection, and deeper analysis into the failure modes and strengths of graph inference approaches.

7. Domain Impact and Summary

The IntelliGraph Benchmark Dataset constitutes a practical, reproducible resource for the systematic evaluation of graph topology inference methodologies. By exposing algorithms to a matrix of real-world downstream tasks, diverse datasets, and robust metrics, it supports fine-grained assessment of methods from conventional k-NN schemes to advanced sparsity- and smoothness-constrained algorithms. IntelliGraph advances the state-of-the-art by establishing a consistent experimental framework that enables fair, application-driven comparison—illuminating genuine empirical gains and weaknesses in graph inference for signal processing and machine learning applications (Lassance et al., 2020).

PDF Markdown Chat (Pro)

References (1)

Graph topology inference benchmarks for machine learning (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to IntelliGraph Benchmark Dataset.