Papers
Topics
Authors
Recent
Search
2000 character limit reached

Random-Walk Positional Encoding

Updated 7 March 2026
  • Random-walk-based positional encoding is a method that leverages stochastic walks, spectral techniques, and higher-order Laplacians to capture local and global graph structure.
  • It generates node, edge, and simplicial features through random feature propagation and normalization, providing multi-scale positional insights for GNNs.
  • Empirical studies show these encodings improve performance in graph-level tasks, node classification, and substructure counting by significantly enhancing model expressivity.

Random-walk-based positional encoding refers to a class of node, edge, and higher-order simplex feature construction schemes that encode structural and positional information in graphs by leveraging properties of random walks and their spectral or spatial generalizations. These methods underpin recent advances in graph neural network (GNN) architectures by enhancing their ability to capture local and global positional information and substructure, directly impacting expressiveness and predictive performance (Eliasof et al., 2023, Zhou et al., 2023).

1. Theoretical Foundations of Random-Walk-Based Encoding

Random-walk-based encodings emerge from the observation that random walks traverse the underlying graph according to its adjacency or connectivity structure, inherently capturing positional correlations at multiple scales. At the node level (0-simplices), the standard random walk is defined via the transition matrix P=D1AP = D^{-1}A, where AA is the adjacency matrix and DD is the degree matrix. Higher powers PtP^t encode return and transition probabilities over tt steps, relating directly to local neighborhood structures.

Spectral encodings, such as those based on the Laplacian eigenvectors or heat kernels, are recovered as specific limits or transformations of this process. The typical Laplacian, L0=DAL_0 = D - A, admits decomposition L0=j=1nλjujujL_0 = \sum_{j=1}^n \lambda_j u_j u_j^\top; positional encoding schemes may use the leading eigenvectors {uj}\{u_j\}, the heat kernel jeβλjujuj\sum_j e^{-\beta \lambda_j} u_j u_j^\top, or resistance-based metrics, all of which can be interpreted through the lens of random-walk-based diffusion (Zhou et al., 2023).

For higher-order topological features (edges and kk-simplices), the generalization proceeds through the incorporation of Hodge Laplacians. For instance, the Hodge $1$-Laplacian,

L1=B1B1+B2B2,L_1 = B_1^\top B_1 + B_2 B_2^\top,

where B1B_1 is the node-edge incidence matrix and B2B_2 the edge-triangle incidence matrix, governs a random walk on the edges, and its spectrum discloses cycle and flow information not accessible by node-level approaches.

2. Construction and Variants of Random-Walk-Based Positional Encodings

Random-walk-based positional encodings can be systematized along both spatial and spectral principles:

  • Spatial Construction (Random Feature Propagation, RWSE, EdgeRWSE):

Positionally meaningful features are obtained by launching random features or one-hot indicators and propagating them across the graph via iterative application of a propagation operator SS, typically a normalized adjacency or Laplacian. The propagation is formalized by

a(0)=r,a^(p)=Sa(p1),a(p)={N(a^(p))p0(modw), a^(p)otherwise,a^{(0)} = r, \qquad \hat{a}^{(p)} = S a^{(p-1)}, \qquad a^{(p)} = \begin{cases} N(\hat{a}^{(p)}) & p \equiv 0 \pmod w, \ \hat{a}^{(p)} & \text{otherwise}, \end{cases}

where NN indicates normalization (either 2\ell_2 channel-wise or QR-based orthonormalization) and ww is the normalization frequency (Eliasof et al., 2023). The resulting trajectory t=ra(1)a(P)t = r \oplus a^{(1)} \oplus \dots \oplus a^{(P)} augments node or edge features with multi-scale positional information.

  • Spectral Construction (Laplacian Eigenmaps, Hodge1Lap):

Eigenvector-based approaches select the dominant eigenvectors of graph-based operators (Laplacian for nodes, Hodge Laplacian for edges). For robustness under sign and basis permutations within degenerate eigenspaces, features can be constructed via projections onto invariant subspaces, e.g., using Pproj,i=UiUiP_{\text{proj},i} = U_i U_i^\top and post-processing the result with an injective function (e.g., MLP) (Zhou et al., 2023).

Random-walk encodings generalize to kk-simplices through the use of coboundary maps BkB_k and Hodge kk-Laplacians Lk=BkBk+Bk+1Bk+1L_k = B_k^\top B_k + B_{k+1} B_{k+1}^\top, enabling kk-RWSE encodings that reflect the topology and connectivity at arbitrary dimension.

3. Theoretical Properties and Expressiveness

Random-walk-based encodings occupy a provable niche between purely random or spectral encodings:

  • Universal Approximation:

Randomly initialized feature-and-propagation trajectories provide universal approximation for continuous functions on the space of finite graphs, given sufficient random features and propagation steps. With high probability, the concatenated trajectory is full-rank and thus captures a maximal diversity of positional signals (Proposition 4.3 in (Eliasof et al., 2023)).

  • Structural Counting:

Early propagation steps encode local walk-derived substructure counts. Notably, the procedure (a(1),a(2))(a^{(1)}, a^{(2)}) from multiple random initializations implements the randomized triangle counting estimator ("TraceTriangleR\operatorname{TraceTriangle}_R") in the sense of Avron (2010), exactly reconstructing trace(A3)\operatorname{trace}(A^3) via suitable averaging (Proposition 4.1 in (Eliasof et al., 2023)).

  • Hierarchical and Expressivity Relationship:

For node-level random-walk encoding (RWSE), the expressivity is strictly lower than the $2$-Folklore Weisfeiler-Lehman test ($2$-FWL): RWSE2\mathrm{RWSE} \prec 2-FWL (Zhou et al., 2023). However, edge-level encodings such as full ("up+down") EdgeRWSE based on edge random walks and Hodge Laplacians surpass $2$-FWL in distinguishing graphs that $2$-FWL cannot.

  • Basis and Sign Invariance:

Spectral encodings constructed via projection to eigenspaces (as in Hodge1Lap) and using basis-invariant functions are invariant under sign and basis flips, essential for geometric interpretability and stability (Zhou et al., 2023).

4. Random Feature Propagation: Workflow, Operators, and Learning

Random Feature Propagation (RFP) formalizes the random-walk-based positional encoding framework with the following workflow (Eliasof et al., 2023):

  1. Selection of Propagation Operator SS: Choices include symmetrically normalized adjacency or Laplacian with self-loops,

A^=D~12A~D~12,L^=D~12L~D~12,\hat{A} = \tilde D^{-\frac12} \tilde A \tilde D^{-\frac12}, \quad \hat{L} = \tilde D^{-\frac12} \tilde L \tilde D^{-\frac12},

with A~=A+I\tilde A = A + I and L~=D~A~\tilde L = \tilde D - \tilde A.

  1. Random Feature Initialization: Instantiate kk-dimensional random vectors rRn×kr\in\mathbb{R}^{n\times k} sampled i.i.d. from a continuous distribution (e.g., standard normal or Rademacher). Multiple random initializations (BB) can be concatenated for bias-variance tradeoff.
  2. Iterative Propagation and Normalization: Features are iteratively propagated using SS for PP steps, with normalization applied every ww steps. Two normalizations are common: column-wise 2\ell_2, and QR-based orthonormalization.
  3. Trajectory Concatenation: The full trajectory, including the initial random features and all intermediate propagated steps, is concatenated to yield a final positional encoding of dimension k(P+1)k(P+1) per node.
  4. Learnable Propagation Operators: Beyond fixed SS, learnable propagation operators can be constructed using GNNs and multi-head self-attention, capturing higher-order or feature-based affinities beyond the static structure.

A schematic for the main parameter choices and their practical impact:

Parameter Typical Values Empirical Effect
Propagation steps PP $8$–$32$ Increasing PP approaches spectral PE
Feature dim kk $16$ (graph), $64$ (node) Higher kk aids heterophilic graphs
Trajectories BB $5$–$10$ Improves coverage, stabilizes features
Normalization ww $1$ w=1w=1 matches subspace iteration, larger ww for efficiency

5. Extensions to Edges, Simplices, and Inter-Level Diffusion

Recent work generalizes random-walk-based positional encodings to all dimensions of simplicial complexes (Zhou et al., 2023):

  • Edge-Level (1-Simplices):

The edge-level random walk is governed by the lifted operator associated with the Hodge 1-Laplacian. The diagonal entries of matrix powers P^\widehat{P} encode return probabilities for edges, leading to edge structural encodings such as EdgeRWSE. Spectral edge encodings use sign- and basis-invariant projections as described earlier.

  • Higher-Order (k-Simplices):

For kk-simplices, random walks are constructed from corresponding Hodge kk-Laplacians. Features are extracted analogously via spatial (matrix power diagonals) or spectral (eigenspace projection) methods.

  • Inter-Level Random Walks:

To enable cross-dimensional diffusion, a block adjacency matrix AK\mathcal{A}_K is defined, concatenating Laplacians and incidence maps. The power AKr\mathcal{A}_K^r encodes the probability of traversing up or down simplex dimensions, providing a comprehensive positional encoding across simplicial hierarchy.

6. Empirical Performance and Practical Considerations

Random-walk-based positional encodings have demonstrated substantial empirical advantages over spectral, random, and classical walk-based encodings:

  • Graph-Level Tasks:

In datasets such as ZINC-12k and OGBG-MOLHIV, RFP-based encodings (particularly with QR orthonormalization and DSS-GNN head) reduced MAE from 0.156\sim0.156 (Laplacian eigenvectors) to $0.1117$ and improved ROC-AUC from 78%\sim 78\% to 80.53%80.53\% (Eliasof et al., 2023). Augmenting GINE with 0-RWSE, 1-down EdgeRWSE, Hodge1Lap, and RWMP reduced MAE from $0.52$ to $0.066$ (Zhou et al., 2023).

  • Node-Level and Synthetic Substructure Counting:

On node classification for homophilic/heterophilic graphs, RFP-QR on A^\hat A with P=16,k=64P=16, k=64 improved performance by up to $10$ percentage points in heterophilic cases. For triangle/substructure counting tasks, RFP-QR matched specialized subgraph GNNs.

  • Edge and Higher-Order Positional Encodings:

EdgeRWSE broke expressivity barriers of $2$-FWL, perfectly distinguishing synthetic graph families unresolvable by previous methods. Hodge1Lap-based enrichments raised accuracy in cycle-classification to 99%99\% (nearly perfect).

  • Computation:

Random-walk PEs require only iterative propagation (no full eigendecomposition), scaling linearly for large, sparse graphs. Spectral approaches for higher-order simplices admit O(mk3)O(m_k^3) complexity per kk, but are restricted to pre-processing.

  • Robustness and Flexibility:

Random-walk-based schemes accommodate learnable operators, adapt to directed or weighted graphs, and unify local walk-derived statistics with global spectral structure.

7. Connections, Generalizations, and Future Directions

Random-walk-based positional encodings unify and extend the scope of prior approaches including random features, Weisfeiler-Lehman structure encodings, Laplacian eigenmaps, resistance-distance embeddings, and more. They serve as a principled bridge between spectral and stochastic representations, and their extension to simplicial complexes equips GNNs to capture multi-scale topological information and equivariant function classes.

A plausible implication is that further generalization to dynamic, attributed, and temporal graphs, or integration with message-passing schemes leveraging trajectory information, remains a promising avenue. Empirical evidence suggests that the bias-variance trade-off, early-step versus late-step propagation, and learnable versus predefined propagation operators merit continued investigation for both expressivity and computational efficiency (Eliasof et al., 2023, Zhou et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Random-Walk-Based Positional Encoding.