Random Walk Graph Kernels

Updated 23 February 2026

Random Walk Graph Kernels are positive-definite functions defined as a weighted sum over matching labeled walks in two graphs, where longer walks are down-weighted.
They underpin methods for graph classification and embedding while facing computational challenges that drive scalable approximations and algorithmic innovations.
Recent advances include efficient solvers, explicit feature maps, and randomized approximations that enhance expressivity and make analysis feasible for large graphs.

A random walk graph kernel is a positive-definite function quantifying similarities between graphs by systematically counting pairs of walks with matching node and edge labels, typically down-weighting longer walks. This family of kernels is among the earliest developed for machine learning on structured, graph-structured data and underpins multiple contemporary approaches to graph representation, classification, and embedding. Mathematically, a random walk graph kernel is defined as a weighted sum over all pairs of walks in two input graphs, capturing structural correspondences at multiple length scales. While theoretically appealing and expressive, classical random walk graph kernels exhibit significant computational and modeling limitations, which have inspired a broad array of algorithmic innovations, improvements, and efficient approximations.

1. Formal Definitions and Computation

Given two graphs $G=(V_G,E_G)$ and $H=(V_H,E_H)$ , possibly with node or edge labels and weights, the canonical random walk graph kernel takes the form

$K_{\mathrm{RWK}}(G, H) = \sum_{\ell=0}^\infty \mu(\ell) \sum_{w \in \mathcal{W}_\ell(G)} \sum_{w' \in \mathcal{W}_\ell(H)} \mathrm{Sim}(w, w')$

where:

$\mathcal{W}_\ell(G)$ is the set of all walks of length $\ell$ in $G$ ,
$\mu(\ell)$ is a nonnegative length-weight sequence (commonly $\mu(\ell) = \lambda^\ell$ , with $0<\lambda<1$ ),
$\mathrm{Sim}(w,w')$ is a product of node/edge label kernels enforcing walk matching (e.g., Dirac delta for discrete labels).

This can be written more concisely using the direct (Kronecker) product graph $G \times H$ with adjacency $A_{G\times H}$ as

$K_{\mathrm{RWK}}(G, H) = \mathbf{1}^\top \left( \sum_{\ell=0}^\infty \mu(\ell) A_{G\times H}^\ell \right) \mathbf{1} = \mathbf{1}^{\top}(I - \lambda A_{G \times H})^{-1}\mathbf{1}$

where $A_{G\times H}$ is constructed so that walks in $G \times H$ correspond to simultaneous matching walks in $G$ and $H$ (Nikolentzos et al., 2019, 0807.0093).

For attributed graphs, vertex and edge label similarity functions $k_V, k_E$ may be incorporated multiplicatively at each step (0807.0093, Kriege et al., 2017). Variants using weighted adjacency, contexts, or higher-order label enrichment (e.g., to counteract trivial "tottering" walks) are definitional extensions (Nikolentzos et al., 2019).

2. Algorithms and Computational Complexity

The direct evaluation of random walk graph kernels requires either explicitly constructing $A_{G\times H}$ —an $|V_G||V_H|$ by $|V_G||V_H|$ matrix—and inverting or powering it (which is $O(n^6)$ in the worst case, naively), or using iterations or matrix equations to avoid full materialization.

Significant algorithmic contributions include:

Reduction to Sylvester/Lyapunov equations exploiting Kronecker structure: complexity $O(n^3)$ , with further gains for sparse graphs (0807.0093).
Iterative/CG-based solvers: avoid constructing $A_{G\times H}$ , instead computing necessary products implicitly, often subcubic (0807.0093, Kriege et al., 2017).
Explicit feature maps for finite-length walk kernels: efficient if label diversity and walk length are small; become infeasible as either increases—empirically, there is a "phase transition" where implicit methods win for high label diversity/long walks (Kriege et al., 2017).
Monte Carlo and random feature approximations: unbiased estimators for node-node or graph-graph kernels based on sampled random walks, with error controlled by sample size (Reid et al., 2023, Choromanski et al., 2024, Choromanski et al., 9 Oct 2025).
Parallel and walk-stitching techniques: to mitigate sequential bottlenecks in long-walk Monte Carlo (Choromanski et al., 9 Oct 2025).

Despite these advances, the explicit construction remains impractical for large graphs or datasets, motivating the development of scalable variants and approximations.

3. Variants, Extensions, and Advances

A diverse suite of random walk kernel extensions address modeling and computational limitations:

Finite-length/randomized walk kernels: Truncate the sum over walk length, or use stochastic termination to limit computation (Nikolentzos et al., 2019, Kriege et al., 2017).
Label context and higher-order Markov kernels: Reduce tottering by encoding additional local context or restricting backtracks (Nikolentzos et al., 2019).
Return probability kernels (RETGK, LDOS): Focus on multistep return probabilities for each node, known to be permutation-invariant and highly descriptive; scalable, support node attributes, and empirically competitive (Zhang et al., 2018, Huang et al., 2020).
Walk-based graph embeddings: Instead of pairwise kernels, embed each graph into a finite-dimensional vector recording statistical features of random walks; inner products or downstream models operate on these embeddings (Li et al., 2017, Huang et al., 2020).
General graph random features (u-GRF, GRFs++): Sample random walks with outputs tailored by learned or designed modulation functions, providing unbiased or low-variance approximations to arbitrary kernel functions (not only random walk kernels) (Reid et al., 2023, Choromanski et al., 9 Oct 2025).
Anonymous walk kernels (AWGK): Map walk sequences to "anonymous" patterns distinguishing structural motifs, outperforming both classical random walk kernels and message passing GNNs on certain structural discrimination tasks (Long et al., 2021).
Multiple kernel learning and convolutional methods: Combine random walk kernels with other base kernels or embed them as learnable graph-convolutional layers, achieving improved expressivity or differentiability (Lee et al., 2024, Celikkanat et al., 2021).

Table: Core Random Walk Kernel Constructions

Approach	Complexity (Dense Case)	Key Feature
Direct product + exact inversion	$O(n^3 n^3)$	All walks, most expressive, impractical size
Kronecker/Sylvester solver	$O(n^3)$	Efficient for moderate graphs
Iterative/CG methods	$O(\text{iters} \cdot n^3)$	Scales on sparse graphs
Explicit feature map (trunc/low $L$ )	$O(nL)$ (low $L$ )	Fast for small label/length
Monte Carlo/stochastic (u-GRF)	$O(n m b)$ , $b=\text{walk len}$	Linear in $n$ ; error $O(1/m)$

4. Expressivity, Limitations, and Relationship to Other Kernels

Classical random walk kernels (RWGK) compare the cumulative structure of graphs by counting all matches of walks, but suffer from "tottering" (excess weight to trivial back-and-forth walks), diagonal dominance (many matching walk sequences are unique to individual graphs as length grows), and limitations in distinguishing certain non-isomorphic structures. Notably:

RWGK is strictly weaker than the 1-WL (Weisfeiler-Leman) subtree kernel—there exist graphs not distinguished by RWGK but separable by WL (Long et al., 2021, Kriege, 2022).
Slight modifications (e.g., node-centric walk grouping with Gaussian embedding) can interpolate between RWGK and WL, even matching or exceeding WL expressiveness under strong enforcements (Kriege, 2022).
Return-probability and LDOS embeddings encode rich local/global structural information, mitigating tottering and resolving isomorphism invariance issues (Zhang et al., 2018, Huang et al., 2020).

Contemporary methods often combine random walk kernels with higher expressivity kernels (WL, SP, GNN-based) or learn walk-weight parameters for improved discriminative power. Combinatorial approaches such as anonymous walk kernels achieve theoretical expressivity surpassing WL and message-passing GNNs under appropriate construction (Long et al., 2021).

5. Applications, Empirical Behavior, and Scalability

Random walk graph kernels have been widely applied to graph classification, node embedding, link prediction, regression on graph-structured data, and graph model selection:

Feature extraction: Graph representations for SVM/random forest; nodewise similarities for spectral clustering (Zhang et al., 2018, Li et al., 2017).
Graph model selection: Walk2Vec-based approaches achieve information-theoretic performance in discriminating graph models (e.g., ER vs. SBM, planted clique) (Li et al., 2017).
Learning curves: In Gaussian process regression, random walk kernels act as discrete analogues of squared-exponential kernels, with local normalization essential for sensible priors on non-homogeneous graphs (Urry et al., 2012).
Approximate methods (u-GRF, GRFs++, Graph Voyagers) enable kernel computation on graphs orders of magnitude larger than feasible with exact algorithms (Reid et al., 2023, Choromanski et al., 2024, Choromanski et al., 9 Oct 2025).
Multiple kernel representation learning integrates random walk proximity with diverse kernel functions and data-driven parameterization for improved node embedding and classification (Celikkanat et al., 2021).

Empirically, classical RWGK is now outperformed by subtree-based and attention-based kernels in benchmark tasks, particularly on large, irregular, or social networks (Nikolentzos et al., 2019, Huang et al., 2020). Scalable approximations match or exceed the original kernel's accuracy, often running in subquadratic time (Reid et al., 2023, Choromanski et al., 2024, Choromanski et al., 9 Oct 2025). Advanced variants (e.g., RetGK, DOS/LDOS) leverage return probabilities and spectral moments to avoid explicit product-graph enumeration while retaining expressive power (Zhang et al., 2018, Huang et al., 2020).

6. Current Trends, Open Problems, and Future Directions

Recent developments center on:

Scalable, unbiased, and low-variance approximations for general random walk kernels and their node-level analogues (e.g., GVoy, u-GRF, GRFs++) (Reid et al., 2023, Choromanski et al., 2024, Choromanski et al., 9 Oct 2025).
Learnable and adaptive kernel construction: Explicit optimization of walk-length weights, modulation functions, and integration with deep learning layers (kernel-convolution, hybrid GNNs) (Huang et al., 2020, Lee et al., 2024).
Unsolved challenges include improved mitigation of tottering and diagonal dominance, adaptive context selection for expressive power, and further improving scalability for attributed and labeled graphs.
Enhanced theoretical analysis of the phase transition between explicit and implicit feature map feasibility as a function of graph label diversity, walk length, and data regime (Kriege et al., 2017).
Integration with multiple kernel learning and flexible kernel composition for heterogeneously labeled and attributed graphs (Celikkanat et al., 2021).

Random walk graph kernels remain a conceptual foundation for graph similarity measurement and offer a testbed for methodological advances in efficient kernel learning on scattered, structured data (Nikolentzos et al., 2019, Zhang et al., 2018, 0807.0093).