Weisfeiler-Lehman Subtree Kernel Overview

Updated 23 February 2026

Weisfeiler-Lehman subtree kernel is a graph similarity measure that counts matching subtree patterns generated by iterative neighborhood relabeling.
It constructs a feature map by aggregating counts of rooted subtree patterns across multiple iterations and computes similarity via an inner product.
Extensions, including weighted and soft variants, enhance its expressiveness and interpretability, achieving state-of-the-art performance across domains.

The Weisfeiler-Lehman (WL) subtree kernel is a graph similarity measure grounded in the iterative neighborhood aggregation procedure of the WL isomorphism test. This kernel assesses the similarity of graphs by counting matching subtree patterns, generated via repeated relabeling (color refinement) of vertices based on their neighbors’ labels. Expressly efficient and applicable to diverse domains—including molecules, social networks, and point-cloud structures—the WL subtree kernel achieves state-of-the-art accuracy in graph classification while maintaining computational tractability. Numerous extensions, such as parametric weighting of subtree patterns and relaxations to soft subtree similarity, have been introduced to augment its expressiveness and interpretability.

1. Weisfeiler-Lehman Relabeling and Subtree Pattern Extraction

Given a graph $G = (V, E)$ with initial node labels $\ell_0(v)$ , the WL labeling scheme iteratively refines node labels over $H$ rounds. At each iteration $h$ , for each node $v$ , the multiset of labels $M_h(v) = \{ \ell_h(u) : u \in N(v) \}$ (with $N(v)$ the neighbors of $v$ ) is collected, sorted, and concatenated with $\ell_h(v)$ to form a string $s_h(v)=(\ell_h(v), M_h(v))$ . A perfect hash compresses $s_h(v)$ into a new integer label $\ell_{h+1}(v) = \text{hash}(s_h(v))$ .

After $H$ rounds, each node possesses a sequence of labels $\ell_0(v), \ldots, \ell_H(v)$ , each encoding the subtree structure rooted at $v$ up to depth $h$ through signature-style label propagation. Each WL label at iteration $h$ bijectively encodes an isomorphism class of rooted subtrees of height $h$ , mirroring the distinguishing power of the 1-dimensional WL test and its connection to rooted tree unfolding (Kriege, 2022, Nguyen et al., 2021).

2. Feature Map Construction and Kernel Definition

For each round $h$ , define $\Sigma^h$ as the set of distinct node labels at iteration $h$ . The rooted subtree feature map is

$\varphi^{(h)}_j(G) = | \{ v \in V : \ell_h(v) = j \} |, \quad \forall j \in \Sigma^h.$

The full graph feature vector $\varphi(G)$ is the concatenation $(\varphi^{(0)}(G), \ldots, \varphi^{(H)}(G))$ , such that each coordinate counts the occurrences of a specific subtree pattern.

The WL subtree kernel of height $H$ is the sum of inner products across all rounds:

$K_{WL}(G, G') = \sum_{h=0}^H \langle \varphi^{(h)}(G), \varphi^{(h)}(G') \rangle,$

where the dot product quantifies the shared occurrence of each rooted subtree pattern in $G$ and $G'$ . This formulation ensures the kernel is positive-semidefinite and histograms the multi-scale topology of the graph neighborhood structure (Nguyen et al., 2021, Ting et al., 2023, Kriege, 2022).

3. Computational Complexity and Implementation

Each WL refinement iteration requires $O(|E| \log d)$ time for sorting (with $d = \max_v \deg(v)$ ), or $O(|E|)$ if radix sort is used. Label counting per iteration is $O(|V|)$ . For $N$ graphs, computing all kernel matrix entries is $O(H|E_{total}|) + O(HN^2 s)$ , where $s$ is the typical number of distinct subtree patterns per feature map. Space complexity is $O(HV)$ per graph. Notable implementation optimizations include sparse storage of histograms and subsampling, where relabeling only a fraction of nodes per iteration can substantially accelerate computation with minimal accuracy loss (Ting et al., 2023).

4. Extensions: Weighted and Soft Subtree Kernels

Weighted WWL Kernel

Standard WL subtree kernels treat every subtree pattern as equally important. This is suboptimal in domains such as molecular classification, where certain substructures (e.g., nitro groups, aromatic rings) are highly predictive. The WWL kernel (Weisfeiler-Lehman–Wasserstein) introduces nonnegative weights $w_v$ for each pattern $v$ and defines a weighted distance:

$d_W(G, G') = b - \langle W, Z(G, G') \rangle,$

where $Z(G, G')$ aggregates $\min(\mu_h(G;v), \mu_h(G';v))$ across all heights $h$ and $\mu_h(G;v) = \varphi^{(h)}_v(G)/|V(G)|$ . The weights $W = [w_v]$ are optimized, subject to convex constraints, through a projected stochastic gradient algorithm on a pairwise hinge-style loss. This approach enables supervised learning of structural relevance while preserving kernel validity and scalability via sparse computation. The learned weights focus the kernel on informative graph substructures, improving both prediction and interpretability, as validated on synthetic and large-scale molecular datasets (Nguyen et al., 2021).

Soft/Relaxed WL Kernels

The original WL kernel uses a hard combinatorial test—subtree patterns only contribute if they are isomorphic. For dense or highly diverse graphs, this restrictiveness leads to sparse feature overlaps and reduced discriminative power. The relaxed WL kernel (Schulz et al., 2021) generalizes by applying a structure-and-depth-preserving tree edit distance (SdTed) to unfolding trees before constructing feature buckets via Wasserstein k-means. This enables clustering of similar but non-identical local neighborhoods, thus capturing graded notions of subtree similarity. Empirical results show that these relaxed variants yield improved performance on structurally complex, dense datasets.

5. Expressiveness and Relations to Other Kernels

The expressivity of the WL subtree kernel is formally equivalent to counting rooted subtrees up to height $H$ . This matches the distinguishing power of the 1-WL test. The kernel can also be interpreted as a special case of walk-based kernels, with iterated relabeling serving as non-linear aggregation of walk counts. By tuning parameters in generalized walk kernels, one can interpolate between (a) strict matching (WL), (b) soft matching (relaxed WL), and (c) full walk-count similarity, spanning the spectrum from powers of adjacency matrices (random walk kernels) to structure-aware subtree aggregation (Kriege, 2022). The softness of label comparison and subtree weighting are independent axes of enhancement.

6. Empirical Performance and Application Domains

The WL subtree kernel and its extensions have demonstrated strong performance across molecular, biological, social network, and even point-cloud domains. For molecular graphs, learning subtree weights leads to statistically significant accuracy gains over non-weighted WWL and baseline kernels. Notably, the weighted WWL achieves 88.37% on MUTAG, 65.44% on PTC-MR, 75.73% on PROTEIN, and 86.45% on NCI1, each outperforming WWL by 0.5–1.5 points or more (Nguyen et al., 2021).

In chemical tagging, the WL kernel, coupled with Gaussian Process regression, achieves robust identification of cluster-mass-function parameters with training data reduced by two orders of magnitude compared to GNNs. The kernel also offers clear interpretability of features, which is advantageous in scientific applications (Ting et al., 2023).

7. Limitations and Future Directions

Although computationally efficient and empirically robust, classic WL subtree kernels do not natively handle continuous attributes or soft structural similarity. The strict isomorphism-based pattern comparison yields sparse feature overlaps for dense graphs, as nearly every local neighborhood is unique (Schulz et al., 2021). Weighted and tree-edit-based relaxations address some of these issues and offer promising avenues for further research. Future extensions target efficient handling of attributes, hybridizations with differentiable graph neural architectures, and further advances in interpretable graph learning.

Key References:

"Learning subtree pattern importance for Weisfeiler-Lehmanbased graph kernels" (Nguyen et al., 2021)
"Weisfeiler-Lehman Graph Kernel Method: A New Approach to Weak Chemical Tagging" (Ting et al., 2023)
"Weisfeiler and Leman Go Walking: Random Walk Kernels Revisited" (Kriege, 2022)
"A Generalized Weisfeiler-Lehman Graph Kernel" (Schulz et al., 2021)