Graph Classification: Methods & Trends

Updated 7 April 2026

Graph classification is the task of mapping graphs with node and edge attributes to discrete class labels using supervised or semi-supervised learning.
Techniques range from handcrafted features and graph kernels to deep learning methods like GNNs and spectral processing, offering trade-offs between efficiency and interpretability.
Applications span chemoinformatics, social network analysis, bioinformatics, and cybersecurity, stressing reproducibility, scalability, and robust benchmarking.

Graph classification is the supervised or semi-supervised learning problem in which each graph—potentially with attributes on nodes and edges—must be mapped to one of a finite set of class labels. This task arises across diverse domains including chemoinformatics (e.g., molecular property prediction), social network analysis (e.g., community type classification), bioinformatics (e.g., protein function prediction), and cybersecurity. Approaches span handcrafted feature engineering, graph kernels, statistical models, spectral and signal-processing methods, subgraph pattern mining, and deep learning via graph neural networks (GNNs). Rigorous comparison of such methods is ongoing; while deep works dominate for large attributed graphs, interpretable or resource-efficient baselines often match or outperform deep models in certain regimes. The field has crystallized around a core set of protocols and datasets, with increasing emphasis on fair reproducibility and scalable, attribute-aware paradigms.

1. Foundational Approaches: Feature, Kernel, and Statistical Models

Classical graph classification relied on descriptors derived from global or local structural properties or subgraph statistics. Feature-engineering approaches extract structural invariants (e.g., number of nodes, edges, diameters, mean/variance of centrality or clustering) or distributions of local graph features (histograms/empirical distributions of degree, clustering coefficient, or node-level properties) (Islam et al., 2024, Adamczyk, 2022). For molecules and attributed graphs, domain-specific fingerprints, such as ECFP (Morgan), RDKit, and MACCS keys, compactly encode substructure presence up to a certain radius or bond pattern (Adamczyk, 2022).

Graph kernels, such as the Weisfeiler-Lehman subtree, graphlets, shortest-path, and random walk kernels, embed graphs into a reproducing kernel Hilbert space based on counts of combinatorial substructures or label propagation orbits. These methods support classical SVM or other kernel classifiers (Adamczyk, 2022, Aktas et al., 2020). Recent work has surveyed and experimentally compared upwards of 38 subgraph quality measures for pattern-based graph classification, emphasizing that measures such as AbsSupDif and Sup yield the best empirical performance and that redundant subgraph patterns can be vastly reduced by clustering their occurrence “footprints” (Potin et al., 19 Jun 2025).

Statistical models address graph classification where all graphs share the same labeled vertex set. The signal-subgraph paradigm seeks the subset of edges whose class-conditional Bernoulli probabilities differ, then parametrizes a Bayes-optimal or plug-in classifier on these edges only, with estimator variants for “coherent" (edges share incident vertices) and “incoherent” signals. Under mild conditions these plug-in classifiers achieve asymptotic optimality and have been validated for statistical connectomics and discriminating human connectomes by sex (Vogelstein et al., 2011).

2. Spectral and Signal Processing Methods

Spectral approaches extract features from the Laplacian or adjacency matrix eigenvalues and (sometimes) eigenvectors, motivated by their invariance to node relabelings and sensitivity to global graph topology. A strong baseline is to take the normalized Laplacian, order its eigenvalues, and use the lowest $k$ (with rows padded) as a fixed-length, permutation-invariant summary; coupled with tree-based classifiers (e.g., random forests), this offers solid performance with minimal tuning (Lara et al., 2018). The extracted “spectral features” have yielded best-in-class accuracy on several chemical graph benchmarks and require substantially less overhead than kernel methods.

More advanced spectral graph signal-processing pipelines construct fixed-length feature maps by partitioning the Laplacian spectrum and recording, for each node feature dimension, the spectral energy in each band (“FT-GP”); or apply learned spectral wavelet filters followed by summarization (“WT-GP”). Both are used as inputs to Gaussian process models, which yield both high accuracy and well-calibrated uncertainty. The wavelet (WT-GP) variant excels on tasks involving localized, multi-scale structures (e.g., distinguishing ring vs. clique, block models), outperforming strong GNN and kernel comparators, and FT-GP is competitive with zero learned filters (Opolka et al., 2023).

Heat-diffusion on simplicial complexes generalizes spectral techniques beyond pairwise edges to higher-order interactions. By compressing a vertex-labeled graph into a super-graph, constructing its clique complex, and applying heat-diffusion to all $p$ -simplices, one can compute the Diffusion Fréchet function, invert its values, and vectorize to obtain feature vectors that capture both local and global higher-order topology. Ablations across $p=0,1,2$ and diffusion scales confirm that this approach outperforms or matches state-of-the-art graph kernels (WL, GK, SP, RW) on several bioinformatics datasets (Aktas et al., 2020).

3. Deep and Reference-based Graph Representation Learning

Modern graph classification frequently relies on the message-passing neural network (MPNN) paradigm, where node (and, by extension, graph) representations are iteratively updated via neighborhood aggregation and then pooled to yield a fixed-size embedding (Adamczyk, 2022). The dominant GNN variants include GCN, GraphSAGE, GIN, GAT, and their “Jumping Knowledge” improvements, with expressivity up to the 1-WL test (Adamczyk, 2022, Meltzer et al., 2019).

Permutation invariance in GNNs is typically handled by either symmetric pooling or explicit construction, as in PiNet, where a double-stack of message-passing GCNs produces both node embeddings and channel-wise attention scores, then pools via a differentiable attention mechanism that is theoretically invariant to node permutation, yielding superior expressivity in isomorphic and molecular benchmarks (Meltzer et al., 2019).

Pooling schemes are a key differentiation: while simple mean/sum/max handles fixed-size embeddings, alternative strategies include attention pooling, hierarchical pooling (DIFFPOOL, MinCutPool), and model-free approaches that preserve node set distributions (Li et al., 2019). Graph Reference Distribution Learning (GRDL) replaces pooling by treating each graph as a bag of latent node embeddings sampled from a distribution, using maximum mean discrepancy (MMD) to compare them directly to learned class-specific references. This achieves state-of-the-art accuracy and generalization bounds with drastically reduced training and inference time (Wang et al., 2024).

The “virtual node” technique, exemplified in Virtual Column Networks (VCN), augments the input graph with a hub node connected bidirectionally to all real nodes, enabling the global context vector (the virtual node state after $T$ rounds) to serve as a learned graph embedding. This yields improved or competitive performance compared to neural and fingerprint baselines, substantially shortening the distance over which information is propagated (Pham et al., 2017).

4. Subgraph Pattern and Structure Mining

Pattern-based classification uses frequent subgraph enumeration combined with discriminative quality measures to select or rank patterns as Boolean features (Potin et al., 19 Jun 2025). After subgraph mining (e.g., with gSpan) and embedding each graph by a binary vector of pattern occurrences, a quality function (WRAcc, Jaccard, AbsSupDif, etc.) is used to rank or select patterns. The field now emphasizes both the critical importance of proper pattern preprocessing—reducing redundancy by clustering patterns with similar occurrence sets (“footprints”)—and careful selection among quality measures, as not all classical options yield effective screening. Empirical results and theoretical analysis identify robust measures (AbsSupDif, Sup, Spec, etc.) for balanced, two-class cases, with clustering-based preprocessing often improving both performance and efficiency (Potin et al., 19 Jun 2025).

Approaches such as GraphCSC embed both “skeleton” (anonymous random walk structures; i.e., distributions of walk shape patterns) and “component” (frequent subgraph) features, learning a unified representation via distributed-bag-of-words objectives. When compared against deep/supervised graph kernels and walk-based embeddings, combining skeleton and component yields superior or best-in-class results on several public benchmarks (Liu et al., 2021).

Homotopy equivalence-based classification reduces each graph to its “basic graph” (canonical compressed representative under contractible deletion/contraction transformations). All graphs in the same homotopy class (as defined via allowed moves on simple vertices, edges, and subgraphs) share key topological invariants (Euler characteristic, homology), yielding a robust, size-reduced summary of the underlying topological structure (Evako, 2015).

5. Efficient and Resource-Constrained Approaches

Resource-sensitive techniques are of growing importance for massive or device-constrained graph scenarios. Hyperdimensional Computing (HDC) models such as GraphHD and VS-Graph exploit high-dimensional, bit-wise algebra and binding/bundling operations to build robust, permutation-invariant hypervector embeddings of graphs. In GraphHD, node hypervectors are derived from PageRank-based node rankings, and graph embeddings are formed by binding and bundling edge representations, yielding drastic speedups (up to 14.6× for training) with slightly reduced accuracy compared to deep and kernel baselines (Nunes et al., 2022). VS-Graph advances this by “spike diffusion" for node ID assignment, purely associative message-passing in high-dimensional space, and prototype-based class averages. It outpaces both GNNs and previous HDC baselines in both accuracy and training speed (up to 450× speedup), and is robust even at drastic dimensionality reduction (D = 128) (Poursiami et al., 3 Dec 2025).

Simple structural feature-based approaches leverage 9-dimensional vectors of graph statistics (nodes, edges, degrees, centralities, spectrum) and, in multiple datasets, outperform or match GNN or kernel models in accuracy and efficiency (Islam et al., 2024).

6. Generative, Diffusion, and Semi-supervised Models

Generative graph classification treats the problem as learning a class-conditional graph model and applying Bayes’ formula to predict the argmax posterior for the class label, e.g., via variational autoencoding with explicitly discriminative conditional ELBO objectives (Schulte, 2023). Recent developments extend score-based diffusion generative models (EDM-style SDEs with SwinGNN backbones) to graph classification. Conditioning the denoising model on the class label and combining a discriminative cross-entropy loss (CLF) with a variational denoising loss (DEN) achieves state-of-the-art classification on graph benchmarks—especially when predictions are ensembled over permutations to reduce node-order dependence—outperforming classical GNN baselines (Xian et al., 2024).

Time-variant graph classification formalizes dynamic graph streams as sequences of evolving snapshots and introduces “graph-shapelet patterns,” defined as contiguous edit-operation subsequences that best discriminate classes. By converting graph sequences to scalar time-series, mining shapelet candidates, and lifting them to symbolic edit patterns, it achieves efficient and accurate early classification, and demonstrates the failure of static-only approaches for evolutionary data (Wang, 2016).

Hierarchical graph perspectives, where nodes of a “super-graph” are graph instances, motivate stacking embedding learning at the graph-instance level (via supervised self-attentive pooling, as in SAGE) with classification on the hierarchical graph structure via a GCN—optimizing both supervised and agreement losses over labeled and unlabeled sets. Semi-supervised variants such as SEAL-C and SEAL-AI alternate updating both classifiers, achieving substantial gains in macro-F1 and accuracy over classical and deep baselines (Li et al., 2019).

Class imbalance in graph classification, e.g., in molecular property prediction, requires explicit handling. Mixture-of-experts architectures, such as GraphDIVE, learn a soft partitioning of graphs via gating networks over graph-level embeddings, then train per-expert classifiers; this yields absolute gains in ROC-AUC and minority-class recall compared to reweighting, resampling, or standard GNNs (Hu et al., 2021).

7. Theoretical Analysis, Empirical Protocols, and Open Problems

Classical and modern methods are accompanied by theoretical guarantees (e.g., generalization error bounds for GRDL based on Rademacher complexity, Bayes-optimality and efficiency for signal-subgraph estimators, homotopy invariance in topological compression) (Vogelstein et al., 2011, Evako, 2015, Wang et al., 2024). The validity and comparative standing of methods are determined on established datasets (MUTAG, PROTEINS, NCI1, DD, IMDB-BINARY/MULTI, COLLAB, etc.) with stratified cross-validation, accuracy, AUC, and F1 scores. Best practices now demand extensive hyperparameter sweeps, ablation over pooling/aggregation schemes, and analysis of computational efficiency.

Despite the dominance of GNNs, multiple studies demonstrate that descriptor- and fingerprint-based models, as well as carefully constructed statistical or spectral baselines, retain surprising power, efficiency, and robustness (Adamczyk, 2022, Lara et al., 2018, Islam et al., 2024). Hybrid and attribute-aware approaches—combining learned embeddings, handcrafted features, and pattern-based substructure selection—are promising avenues. Open problems include development of efficient multi-class and multi-label measures for pattern-based selection, further theoretical analysis of reference-based methods, scalable handling of attributed and heterogeneous graphs, and robust graph classification under graphistic perturbations or adversarial scenarios.