Papers
Topics
Authors
Recent
Search
2000 character limit reached

PPR Tokenization in Graph ML & Crypto

Updated 6 May 2026
  • PPR tokenization is a dual-approach that applies Personalized PageRank for generating node token sequences in graph transformers and employs pseudorandom permutations for secure, format-preserving cryptographic tokens.
  • In graph machine learning, it compresses a node's local and global relationships into tunable token lists, enabling efficient mini-batch processing and expressivity akin to GCNs with Jumping Knowledge.
  • In cryptography, the method uses cycle-walking combined with secure block ciphers to produce IND-CPA compliant tokens, ensuring privacy and format conformity for sensitive data.

PPR tokenization refers to two distinct, rigorously formalized methodologies in graph machine learning and cryptography. In graph learning, PPR tokenization leverages Personalized PageRank (PPR) to produce node-specific token sequences for use in transformer architectures on large-scale graphs, prominently in models such as VCR-Graphormer. In cryptographic tokenization, the PPR acronym references a pseudorandom-permutation-based (“Reversible Hybrid”) algorithm for format-preserving, IND-CPA secure token generation. Both employ PPR concepts but target very different operational contexts and theoretical guarantees.

1. Formal Foundations of PPR Tokenization in Graph Transformers

Let G=(V,E)G=(V,E) denote an undirected graph with n=Vn=|V| nodes, adjacency matrix ARn×nA\in\mathbb{R}^{n\times n}, degree matrix DD, and node feature matrix XRn×dX\in\mathbb{R}^{n\times d}. Normalized adjacency PP can be column-stochastic, P=AD1P = AD^{-1}, or symmetric, P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}.

Personalized PageRank for node uu solves: pu=αPpu+(1α)eup_u = \alpha P p_u + (1-\alpha) e_u where n=Vn=|V|0 is the canonical basis vector at n=Vn=|V|1 and n=Vn=|V|2 is the teleport (damping) parameter, typically n=Vn=|V|3. The closed form is

n=Vn=|V|4

yielding a steady-state distribution over n=Vn=|V|5 capturing the influence of n=Vn=|V|6.

PPR tokenization constructs, for each node n=Vn=|V|7, a token list n=Vn=|V|8 using n=Vn=|V|9:

  • Aggregated form:

ARn×nA\in\mathbb{R}^{n\times n}0

for ARn×nA\in\mathbb{R}^{n\times n}1 hops, optionally with step-dependent weights.

  • Discrete (top-ARn×nA\in\mathbb{R}^{n\times n}2) form:
  1. Compute ARn×nA\in\mathbb{R}^{n\times n}3 via power iteration or the Andersen-Chung-Lang “push” method.
  2. Extract indices ARn×nA\in\mathbb{R}^{n\times n}4 of the top ARn×nA\in\mathbb{R}^{n\times n}5 entries in ARn×nA\in\mathbb{R}^{n\times n}6.
  3. Tokens are ARn×nA\in\mathbb{R}^{n\times n}7.

This mechanization enables offline decoupling of topology-aware feature construction from actual model training, facilitating efficient batching in transformer architectures as instantiated by VCR-Graphormer (Fu et al., 2024).

2. Offline Token List Construction and Complexity

PPR-induced token lists for all nodes are built offline and stored for later batch loading. The batched, parallelizable algorithm is:

  • For each node ARn×nA\in\mathbb{R}^{n\times n}8:
    • Select DD4 with DD5.
    • Distribute DD6 to DD7 and DD8 to DD9’s out-neighbors, reset XRn×dX\in\mathbb{R}^{n\times d}0.
    • 3. Choose the XRn×dX\in\mathbb{R}^{n\times d}1 largest entries in XRn×dX\in\mathbb{R}^{n\times d}2 as XRn×dX\in\mathbb{R}^{n\times d}3, store XRn×dX\in\mathbb{R}^{n\times d}4.

Total offline cost per node is XRn×dX\in\mathbb{R}^{n\times d}5, with total cost XRn×dX\in\mathbb{R}^{n\times d}6 for the entire graph; in practice, XRn×dX\in\mathbb{R}^{n\times d}7 is amortized constant for XRn×dX\in\mathbb{R}^{n\times d}8 fixed. This offline preprocessing ensures training time is independent of XRn×dX\in\mathbb{R}^{n\times d}9 when using mini-batching (Fu et al., 2024).

3. Theoretical Equivalence to GCNs with Jumping Knowledge

PPR tokenization in transformers provably mirrors the function class admitted by polynomial-filter GCNs augmented with Jumping Knowledge (JK) pooling.

For a PP0-layer GCN with Laplacian PP1, feature propagation is: PP2 Neglecting nonlinearities, repeated propagation yields a polynomial in PP3 acting on PP4, i.e., PP5. Aggregating PP6 in PP7 and applying multi-head attention is equivalent to performing JK pooling across GCN outputs (see Theorem 3.1 and Appendix 6.1 of (Fu et al., 2024)). This identification ensures no loss in the expressivity spectrum for local structural encoding in the transformer regime.

4. Practical Regimes: Mini-Batch Attention and Hyperparameterization

Dense attention scales quadratically (PP8) with node count—prohibitive for large graphs. By restricting each node to a token list of size PP9 (with P=AD1P = AD^{-1}0), PPR tokenization yields per-batch cost P=AD1P = AD^{-1}1 for a batch of P=AD1P = AD^{-1}2 nodes and P=AD1P = AD^{-1}3 memory, decoupled from the total number of nodes. This shift enables practical mini-batch training for transformers on large, real-world graphs (Fu et al., 2024).

Hyperparameters include:

  • Teleport probability P=AD1P = AD^{-1}4 (default P=AD1P = AD^{-1}5), controlling locality versus globality of tokens.
  • Token list length P=AD1P = AD^{-1}6, typically P=AD1P = AD^{-1}7.
  • For aggregated forms, hop count P=AD1P = AD^{-1}8 (often P=AD1P = AD^{-1}9).
  • Additional knobs arise when employing VCR-Graphormer-style virtual nodes.

Empirical tuning balances P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}0 runtime with representational coverage.

5. Integration with Multi-Head Self-Attention

Each P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}1 forms a token sequence analogous to word embeddings in NLP transformers. The model stack per node becomes:

  • P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}2
  • With P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}3 layers,

P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}4

where MHA is standard multi-head attention, and the PPR score constitutes a positional/bias identifier, optionally as a log-bias in the QK product. This design uniformly extends transformer semantics to the tokenized local graph neighborhood (Fu et al., 2024).

6. Empirical Performance and Scaling

VCR-Graphormer using PPR tokenization achieves competitive or superior results across both small and large-scale node classification benchmarks:

  • Small graphs (e.g., PubMed): P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}5 accuracy, matching or exceeding contemporaries such as NAGphormer (P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}6).
  • Large graphs (Reddit, Aminer, Amazon2M): Comparable or improved scores with no P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}7 eigen-decomposition cost; e.g., P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}8 on Reddit versus P=D1/2AD1/2P = D^{-1/2}AD^{-1/2}9 for NAGphormer.
  • Heterophilous graphs: Augmentation with virtual nodes addressing PPR's inductive bias limitations achieves state-of-the-art or contiguous performance improvements.

Runtime analysis indicates practical feasibility: On Amazon2M, PPR-based token sampling plus partitioning requires uu0 seconds, less than the uu1 seconds for eigen-decomposition in DGL (Fu et al., 2024).

7. PPR Tokenization in Cryptographic Tokenization

Longo–Aragona–Sala's "PPR" tokenization algorithm in cryptography is a format-preserving, cycle-walking construction built atop a secure block cipher and a collision-resistant tweak function: uu2 Here, the algorithm concatenates a tweak-derived hash to the numeric input, encrypts under a block cipher (e.g., AES-256), and cycle-walks until the result fits the decimal range. Security formally reduces to the IND-CPA property of the underlying cipher, yielding strong privacy and non-forgeability guarantees (Longo et al., 2016). The expected computational requirement is "about two AES encryptions plus one SHA-256 hash" per token, supporting high-throughput payment workloads with the benefit of format preservation and PCI DSS compliance.

8. Principal Applications and Theoretical Significance

In graph neural networks, PPR tokenization enables expressivity-preserving, scalable transformer training by marrying spectral random-walk locality with tokenization for efficient self-attention. It compresses a node's global relational context into a fixed, tunable memory footprint per node, proving essential for extending transformer-based architectures to massive and heterogeneous graphs.

In cryptographic contexts, PPR tokenization secures the reversible obfuscation of sensitive identifiers (e.g., payment card numbers) while maintaining essential format constraints and providing provably strong security guarantees under practical cost models.

Both applications exemplify the utility of the PPR paradigm in compressing, privatizing, and operationalizing domain-specific representations—one for learning on graphs, another for compliant cryptographic tokenization (Fu et al., 2024, Longo et al., 2016).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PPR Tokenization.