Random Walk with Jumps in Graph Sampling

Updated 4 June 2026

Random Walk with Jumps is a Markovian process that combines local transitions with random jumps to overcome network bottlenecks.
It improves network exploration and fast mixing, making it valuable for sampling in graphs with heavy-tailed degree distributions and sparse connectivity.
Variants such as RWE, WJRW, and multi-hopper models optimize the trade-off between local coverage and global sampling efficiency.

A random walk with jumps (RWJ) is a Markovian process on a graph where standard local transitions are augmented with a stochastic component allowing transitions (jumps) to remote or uniformly chosen nodes. This mechanism improves the exploration and mixing properties of random walks used for network sampling, estimation, statistical learning, and decentralized optimization on complex networks. RWJ and its numerous variants—random walk with escaping (RWE), random walk with weighted or indirect jumps, random multi-hopper, and combinations with Metropolis–Hastings (MH)—form a foundation for sampling and learning algorithms capable of efficiently traversing networks with pronounced bottlenecks, heavy-tailed degree distributions, and sparse connectivity.

1. Mathematical Formulation of Random Walk with Jumps

The canonical RWJ augments the adjacency structure of a graph $G=(V,E)$ by incorporating a jump probability—parameterized by $\alpha>0$ —allowing a walk to transition to any node with nonzero probability. In the simplest undirected setting, let $A=(a_{ij})$ be the (possibly weighted) adjacency matrix, $d_i=\sum_j a_{ij}$ the degree of node $i$ , and $n=|V|$ . The transition kernel is

$P_{ij}(\alpha) = \begin{cases} \frac{1+\alpha/n}{d_i+\alpha}, & (i,j)\in E, \ \frac{\alpha/n}{d_i+\alpha}, & i\neq j,\,(i,j)\notin E. \end{cases}$

The stationary distribution is

$\pi_i(\alpha) = \frac{d_i + \alpha}{2m + \alpha n},$

where $m=|E|$ for undirected graphs. For $\alpha\to 0$ this reduces to degree-proportional bias; for $\alpha>0$ 0, to the uniform distribution. The spectral gap

$\alpha>0$ 1

quantifies relaxation and mixing; introducing jumps guarantees $\alpha>0$ 2, and thus mixing time scales as $\alpha>0$ 3 (Avrachenkov et al., 2018).

In directed graphs, the analogous process yields transition probabilities

$\alpha>0$ 4

with stationary $\alpha>0$ 5 (Qi, 2022).

These processes are aperiodic and irreducible for $\alpha>0$ 6, with ergodic averages supporting unbiased estimation of node statistics via bias-correction.

2. Impact on Mixing, Spectral Properties, and Practical Tuning

Introducing jumps dramatically enlarges the spectral gap, especially for graphs with pronounced clustering or structural bottlenecks. For the standard RWE (random walk with escaping), the kernel

$\alpha>0$ 7

ensures $\alpha>0$ 8, leading to total-variation mixing time $\alpha>0$ 9 (Qi, 2022). Empirical studies report 5–10× faster mixing than simple random walk for $A=(a_{ij})$ 0 on OSNs (Qi, 2022, Avrachenkov et al., 2018).

Optimal jump rate $A=(a_{ij})$ 1 is application- and topology-dependent. For degree-heterogeneous graphs, choosing $A=(a_{ij})$ 2 on the order of the mean or maximum degree balances local coverage and fast mixing (Avrachenkov et al., 2018, Qi, 2022). In WJRW, setting $A=(a_{ij})$ 3– $A=(a_{ij})$ 4 yields the best KL-divergence and coverage (Qi, 2022).

For nonuniform or weighted jumps, as in Ruelle–Bowen walks, continuous-time generators with maximal entropy rates (i.e., those equidistributing paths of a given length and end-points) provide alternative sampling processes, with invariant distributions related to Perron–Frobenius eigenvectors (Chen et al., 2018).

3. Advanced Variants: Weighted, Multi-hop, and Indirect Jumps

Weighted Jump Random Walk (WJRW)

The WJRW introduces a tunable parameter $A=(a_{ij})$ 5 to mediate between standard random walk and uniform node sampling without degrading conductance. The transition from node $A=(a_{ij})$ 6 is:

Walk to a neighbor: $A=(a_{ij})$ 7
Jump to a low-degree node in $A=(a_{ij})$ 8: $A=(a_{ij})$ 9

Stationary distribution interpolates between degree-proportional and uniform, with mixing rate and practical performance peaking at $d_i=\sum_j a_{ij}$ 0 near $d_i=\sum_j a_{ij}$ 1 (Qi, 2022).

Indirect Jumps via Two-Layer Structures

When uniform node sampling is infeasible, a two-layer graph model with a target, auxiliary, and bipartite coupling graph enables indirect jumps. Here, jumps are realized by sampling on the auxiliary graph and mapping via the bipartite structure (Zhao et al., 2017). The stationary distribution and unbiased estimation are guaranteed by detailed balance, and performance matches or exceeds traditional RWJ if the coupling is sufficiently mixing.

Multi-hopper and Power-law Distance Kernels

In the random multi-hopper model, the walk jumps to any node with probability decaying with shortest-path distance:

$d_i=\sum_j a_{ij}$ 2

As $d_i=\sum_j a_{ij}$ 3 or $d_i=\sum_j a_{ij}$ 4, the process converges to the complete-graph RW, achieving optimal mixing ( $d_i=\sum_j a_{ij}$ 5 for all node pairs). Intermediate values offer tunable trade-offs between locality and mixing (Estrada et al., 2016).

4. Random Walk with Jumps for Decentralized Learning and Metropolis–Hastings

Weighted transition matrices, e.g., via MH to match node-wise smoothness or data heterogeneity, enable importance sampling in decentralized stochastic gradient descent. However, such walks are susceptible to "entrapment": getting stuck on high-weight vertices, causing slow mixing and sample correlation (Liu et al., 14 Apr 2026, Liu et al., 2024).

Metropolis–Hastings with Lévy Jumps (MHLJ) addresses this by interleaving MH steps with random-length jump sequences sampled from a truncated geometric on graph distance. The composite kernel

$d_i=\sum_j a_{ij}$ 6

retains desired stationary law up to $d_i=\sum_j a_{ij}$ 7 bias, while enhancing the spectral gap and eliminating entrapment. Theoretical results show convergence bounds with mixing-time factors, and the error gap induced by jumps decays as $d_i=\sum_j a_{ij}$ 8 (Liu et al., 14 Apr 2026).

Empirical and theoretical recommendations set $d_i=\sum_j a_{ij}$ 9– $i$ 0, with adaptive decay as sampling saturates. Long-range jumps implemented as short uniform neighbor-to-neighbor sequences are both decentralization-compatible and effective under locality constraints (Liu et al., 14 Apr 2026, Liu et al., 2024).

5. Applications and Estimation Techniques

RWJ and its extensions are applied in:

Social network sampling: robust degree, motif, and order estimation via bias-corrected estimators, e.g., Horvitz–Thompson or Hansen–Hurwitz, with sample weights determined by stationary distribution (Qi, 2022, Murai et al., 2017).
Decentralized machine learning: token-based model propagation and data-parallel SGD under heterogeneity are accelerated via weighted RWJ or MHLJ to ensure representative and fast exploration (Liu et al., 14 Apr 2026).
Max-entropy path sampling and motif analysis: Ruelle–Bowen continuous-time models are essential when uniform path coverage is required (Chen et al., 2018).
Network exploration and crawling: multi-hopper models, indirect jumps, and WJRW provide approaches to fast coverage, especially when access to uniform sampling is limited or expensive (Estrada et al., 2016, Zhao et al., 2017, Qi, 2022).

6. Estimator Construction and Practical Guidelines

Given the non-uniform stationary law, unbiased estimators for any function $i$ 1 are constructed as

$i$ 2

where $i$ 3 is specified by the transition kernel (e.g., $i$ 4 for RWwJ on directed graphs) (Qi, 2022, Zhao et al., 2017, Qi, 2022).

For tuning:

Choose the jump parameter to balance estimator variance and bias; higher jump rates accelerate mixing but reduce locality.
When possible, apply hybrid initial sampling (multiple seeds) plus jumps to optimize estimator variance across degree ranges (Murai et al., 2017).
Burn-in: discard initial samples up to the empirical mixing time, often $i$ 5 (Qi, 2022, Qi, 2022).
For highly clustered or large-diameter networks, prioritize higher jump rates or longer-range kernels to prevent correlated samples (Estrada et al., 2016, Qi, 2022).

RWJ connects to PageRank (with restart), continuous-time Markov jump processes (Ruelle–Bowen, maximal entropy), and multi-walker and multi-hopper models. Each imbues the random walk process with additional flexibility, adapting coverage, bias, and mixing rates to task and topology. DUFS and related multi-walker jump schemes further generalize the paradigm to allow stratification over initial placements, parallelization, and adaptive hybrid estimators (Murai et al., 2017).

Empirical evidence consistently documents the superior mixing, coverage, and estimator variance properties of RWJ and its variants, especially in power-law, clustered, or otherwise structurally heterogeneous graphs (Avrachenkov et al., 2018, Qi, 2022, Estrada et al., 2016). Constraints arise in situations disallowing uniform sampling; indirect or multi-hop jump methodologies mitigate these limitations at the expense of more intricate estimator structures (Zhao et al., 2017). In decentralized learning, RWJ combined with MH and Lévy-type jumps is currently the state-of-the-art solution for overcoming entrapment and heterogeneity-induced mixing bottlenecks (Liu et al., 14 Apr 2026, Liu et al., 2024).