PPR Tokenization in Graph ML & Crypto

Updated 6 May 2026

PPR tokenization is a dual-approach that applies Personalized PageRank for generating node token sequences in graph transformers and employs pseudorandom permutations for secure, format-preserving cryptographic tokens.
In graph machine learning, it compresses a node's local and global relationships into tunable token lists, enabling efficient mini-batch processing and expressivity akin to GCNs with Jumping Knowledge.
In cryptography, the method uses cycle-walking combined with secure block ciphers to produce IND-CPA compliant tokens, ensuring privacy and format conformity for sensitive data.

PPR tokenization refers to two distinct, rigorously formalized methodologies in graph machine learning and cryptography. In graph learning, PPR tokenization leverages Personalized PageRank (PPR) to produce node-specific token sequences for use in transformer architectures on large-scale graphs, prominently in models such as VCR-Graphormer. In cryptographic tokenization, the PPR acronym references a pseudorandom-permutation-based (“Reversible Hybrid”) algorithm for format-preserving, IND-CPA secure token generation. Both employ PPR concepts but target very different operational contexts and theoretical guarantees.

1. Formal Foundations of PPR Tokenization in Graph Transformers

Let $G=(V,E)$ denote an undirected graph with $n=|V|$ nodes, adjacency matrix $A\in\mathbb{R}^{n\times n}$ , degree matrix $D$ , and node feature matrix $X\in\mathbb{R}^{n\times d}$ . Normalized adjacency $P$ can be column-stochastic, $P = AD^{-1}$ , or symmetric, $P = D^{-1/2}AD^{-1/2}$ .

Personalized PageRank for node $u$ solves: $p_u = \alpha P p_u + (1-\alpha) e_u$ where $n=|V|$ 0 is the canonical basis vector at $n=|V|$ 1 and $n=|V|$ 2 is the teleport (damping) parameter, typically $n=|V|$ 3. The closed form is

$n=|V|$ 4

yielding a steady-state distribution over $n=|V|$ 5 capturing the influence of $n=|V|$ 6.

PPR tokenization constructs, for each node $n=|V|$ 7, a token list $n=|V|$ 8 using $n=|V|$ 9:

Aggregated form:

$A\in\mathbb{R}^{n\times n}$ 0

for $A\in\mathbb{R}^{n\times n}$ 1 hops, optionally with step-dependent weights.

Discrete (top- $A\in\mathbb{R}^{n\times n}$ 2) form:

Compute $A\in\mathbb{R}^{n\times n}$ 3 via power iteration or the Andersen-Chung-Lang “push” method.
Extract indices $A\in\mathbb{R}^{n\times n}$ 4 of the top $A\in\mathbb{R}^{n\times n}$ 5 entries in $A\in\mathbb{R}^{n\times n}$ 6.
Tokens are $A\in\mathbb{R}^{n\times n}$ 7.

This mechanization enables offline decoupling of topology-aware feature construction from actual model training, facilitating efficient batching in transformer architectures as instantiated by VCR-Graphormer (Fu et al., 2024).

2. Offline Token List Construction and Complexity

PPR-induced token lists for all nodes are built offline and stored for later batch loading. The batched, parallelizable algorithm is:

For each node $A\in\mathbb{R}^{n\times n}$ $A \in R^{n \times n}$ 8:
- Select $D$ 4 with $D$ 5.
- Distribute $D$ 6 to $D$ 7 and $D$ 8 to $D$ 9’s out-neighbors, reset $X\in\mathbb{R}^{n\times d}$ 0.
- 3. Choose the $X\in\mathbb{R}^{n\times d}$ 1 largest entries in $X\in\mathbb{R}^{n\times d}$ 2 as $X\in\mathbb{R}^{n\times d}$ 3, store $X\in\mathbb{R}^{n\times d}$ 4.

Total offline cost per node is $X\in\mathbb{R}^{n\times d}$ 5, with total cost $X\in\mathbb{R}^{n\times d}$ 6 for the entire graph; in practice, $X\in\mathbb{R}^{n\times d}$ 7 is amortized constant for $X\in\mathbb{R}^{n\times d}$ 8 fixed. This offline preprocessing ensures training time is independent of $X\in\mathbb{R}^{n\times d}$ 9 when using mini-batching (Fu et al., 2024).

3. Theoretical Equivalence to GCNs with Jumping Knowledge

PPR tokenization in transformers provably mirrors the function class admitted by polynomial-filter GCNs augmented with Jumping Knowledge (JK) pooling.

For a $P$ 0-layer GCN with Laplacian $P$ 1, feature propagation is: $P$ 2 Neglecting nonlinearities, repeated propagation yields a polynomial in $P$ 3 acting on $P$ 4, i.e., $P$ 5. Aggregating $P$ 6 in $P$ 7 and applying multi-head attention is equivalent to performing JK pooling across GCN outputs (see Theorem 3.1 and Appendix 6.1 of (Fu et al., 2024)). This identification ensures no loss in the expressivity spectrum for local structural encoding in the transformer regime.

4. Practical Regimes: Mini-Batch Attention and Hyperparameterization

Dense attention scales quadratically ( $P$ 8) with node count—prohibitive for large graphs. By restricting each node to a token list of size $P$ 9 (with $P = AD^{-1}$ 0), PPR tokenization yields per-batch cost $P = AD^{-1}$ 1 for a batch of $P = AD^{-1}$ 2 nodes and $P = AD^{-1}$ 3 memory, decoupled from the total number of nodes. This shift enables practical mini-batch training for transformers on large, real-world graphs (Fu et al., 2024).

Hyperparameters include:

Teleport probability $P = AD^{-1}$ 4 (default $P = AD^{-1}$ 5), controlling locality versus globality of tokens.
Token list length $P = AD^{-1}$ 6, typically $P = AD^{-1}$ 7.
For aggregated forms, hop count $P = AD^{-1}$ 8 (often $P = AD^{-1}$ 9).
Additional knobs arise when employing VCR-Graphormer-style virtual nodes.

Empirical tuning balances $P = D^{-1/2}AD^{-1/2}$ 0 runtime with representational coverage.

5. Integration with Multi-Head Self-Attention

Each $P = D^{-1/2}AD^{-1/2}$ 1 forms a token sequence analogous to word embeddings in NLP transformers. The model stack per node becomes:

$P = D^{-1/2}AD^{-1/2}$ 2
With $P = D^{-1/2}AD^{-1/2}$ 3 layers,

$P = D^{-1/2}AD^{-1/2}$ 4

where MHA is standard multi-head attention, and the PPR score constitutes a positional/bias identifier, optionally as a log-bias in the QK product. This design uniformly extends transformer semantics to the tokenized local graph neighborhood (Fu et al., 2024).

6. Empirical Performance and Scaling

VCR-Graphormer using PPR tokenization achieves competitive or superior results across both small and large-scale node classification benchmarks:

Small graphs (e.g., PubMed): $P = D^{-1/2}AD^{-1/2}$ 5 accuracy, matching or exceeding contemporaries such as NAGphormer ( $P = D^{-1/2}AD^{-1/2}$ 6).
Large graphs (Reddit, Aminer, Amazon2M): Comparable or improved scores with no $P = D^{-1/2}AD^{-1/2}$ 7 eigen-decomposition cost; e.g., $P = D^{-1/2}AD^{-1/2}$ 8 on Reddit versus $P = D^{-1/2}AD^{-1/2}$ 9 for NAGphormer.
Heterophilous graphs: Augmentation with virtual nodes addressing PPR's inductive bias limitations achieves state-of-the-art or contiguous performance improvements.

Runtime analysis indicates practical feasibility: On Amazon2M, PPR-based token sampling plus partitioning requires $u$ 0 seconds, less than the $u$ 1 seconds for eigen-decomposition in DGL (Fu et al., 2024).

7. PPR Tokenization in Cryptographic Tokenization

Longo–Aragona–Sala's "PPR" tokenization algorithm in cryptography is a format-preserving, cycle-walking construction built atop a secure block cipher and a collision-resistant tweak function: $u$ 2 Here, the algorithm concatenates a tweak-derived hash to the numeric input, encrypts under a block cipher (e.g., AES-256), and cycle-walks until the result fits the decimal range. Security formally reduces to the IND-CPA property of the underlying cipher, yielding strong privacy and non-forgeability guarantees (Longo et al., 2016). The expected computational requirement is "about two AES encryptions plus one SHA-256 hash" per token, supporting high-throughput payment workloads with the benefit of format preservation and PCI DSS compliance.

8. Principal Applications and Theoretical Significance

In graph neural networks, PPR tokenization enables expressivity-preserving, scalable transformer training by marrying spectral random-walk locality with tokenization for efficient self-attention. It compresses a node's global relational context into a fixed, tunable memory footprint per node, proving essential for extending transformer-based architectures to massive and heterogeneous graphs.

In cryptographic contexts, PPR tokenization secures the reversible obfuscation of sensitive identifiers (e.g., payment card numbers) while maintaining essential format constraints and providing provably strong security guarantees under practical cost models.

Both applications exemplify the utility of the PPR paradigm in compressing, privatizing, and operationalizing domain-specific representations—one for learning on graphs, another for compliant cryptographic tokenization (Fu et al., 2024, Longo et al., 2016).

Markdown Report Issue Upgrade to Chat

References (2)

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections (2024)

Several Proofs of Security for a Tokenization Algorithm (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PPR Tokenization.