QG-PPR: Personalized PageRank for Logic

Updated 20 January 2026

The paper introduces QG-PPR as a scalable framework that leverages personalized PageRank for efficient, query-guided inference in first-order logic.
It constructs a localized proof graph using restart edges and probabilistic transitions to bias the search toward short, high-probability proofs.
Empirical results show improved mean average precision and AUC over MLNs with significant gains in inference speed and scalability.

Question-Guided Personalized PageRank (QG-PPR) is a framework for efficient probabilistic inference in first-order logic representations, formulated to enable scalable, locally groundable reasoning over large databases. QG-PPR, as implemented in ProPPR, interprets query answering as a personalized PageRank process over a query-induced proof graph, leveraging local search and restart mechanisms to bias inference toward short proofs and high-probability answers. The approach supports efficient, parallelizable inference and learning, with empirical performance advantages over Markov Logic Networks (MLNs) on entity resolution tasks (Wang et al., 2013).

1. Formal Foundations and Semantics

QG-PPR is built atop a definite-clause logic program $LP = \{c_1, \ldots, c_n\}$ and a database $DB$ of unit facts. A query $Q$ is represented as a conjunction of literals $R_1 \land \ldots \land R_k$ . The proof state at any step is encoded as $u = (Q_{\text{transformed}}, \text{subgoal list})$ , where $Q_{\text{transformed}}$ is the query with substitutions applied to date, and the subgoal list records the remaining goals to prove.

The initial or start node $v_0$ is $(Q, Q)$ , while a solution node has an empty subgoal list and is denoted by the symbol $\Box$ . The SLD proof graph $G'$ —potentially infinite in size—captures the space of all proofs of $Q$ using $LP$ and $DB$ . QG-PPR extends $G'$ by adding restart edges to create $G_{Q,LP}$ .

Inference is defined as a random walk with restarts, seeded at $v_0$ , over $G_{Q,LP}$ . Personalized PageRank computes a probability distribution over solution nodes (ground answers $Q\theta$ ), structurally favoring nodes closer to $v_0$ through the restart mechanism.

2. Query-Induced Grounding Graph Construction

Each node $u$ in $G_{Q,LP}$ is a proof state of the form $(Q\theta, (R_1, \ldots, R_k))$ . For each $u$ and each clause $c: R' \leftarrow S'_1, \ldots, S'_\ell$ in $LP$ , if the leftmost subgoal $R_1$ unifies with $R'$ via most general unifier $\theta$ , a proof edge is created:

$u \xrightarrow{\,\varphi_c(\theta)\,} v$ , where $v = (Q\theta, (S'_1, ..., S'_\ell, R_2, ..., R_k)\theta)$ ,
Each edge is annotated by a feature vector $\varphi_c(\theta)$ , reflecting user-defined feature literals instantiated under $\theta$ .

Additionally, each node $u$ receives a restart edge to $v_0$ with feature annotation $\varphi_{\text{restart}}(R_1\theta)$ , biasing the walk toward short proofs. Database facts (unit clauses) act as degenerate clauses with feature $\varphi = \{\text{db}\}$ .

This query-guided construction ensures that only those nodes reachable from $v_0$ —i.e., relevant to the query—are included in the grounding, promoting scalability.

3. Personalized PageRank on Proof Graphs

Transitions within $G_{Q,LP}$ are governed by a row-stochastic matrix $W$ , with transitions parameterized as:

$Pr_w(v|u) \propto f(w, \varphi[u \to v])$ , typically with $f(w, \varphi) = \exp(w \cdot \varphi)$ ,
Transition probabilities for each neighbor $v \in N(u)$ are normalized such that $\sum_{v \in N(u)} Pr_w(v|u) = 1$ ,
The restart edge from $u$ to $v_0$ is assigned probability $\alpha$ , with the remaining mass $1-\alpha$ distributed over proof edges.

The personalized PageRank vector $\pi$ is defined as the stationary distribution:

$\pi = \alpha e_{v_0} + (1-\alpha) W^\top \pi$

where $e_{v_0}$ is a unit vector at the start node. Power iteration is used in practice for convergence:

$\pi^{t} = \alpha e_{v_0} + (1-\alpha) W^\top \pi^{t-1}$

4. Local Inference via PageRank-Nibble-Prove

To achieve localized, query-specific inference, ProPPR employs the Andersen–Chung–Lang “PageRank-Nibble” method. This procedure simultaneously approximates $\pi$ for the seed $v_0$ and enumerates a compact subgraph $\hat{G}$ sufficient for inference within controlled error.

A high-level pseudocode for PageRank-Nibble-Prove is as follows:

define PageRank-Nibble-Prove(Q, α', ε):
    let v0 = (Q, Q)
    initialize residual r[v0] = 1, estimate p[v] = 0 for all v, Ĝ ← ∅
    while ∃u with r[u]/|N(u)| > ε do
        push(u)
    return (p, Ĝ)

define push(u):
    p[u] ← p[u] + α'·r[u]
    δ ← (1–α')·r[u]
    r[u] ← 0
    for each neighbor v ∈ N(u):
        Ĝ.addEdge(u→v)
        r[v] ← r[v] + Pr(v|u)·δ

Here, $\alpha'$ is a lower bound on the restart probability (typically set to $\alpha$ ), and $\epsilon$ specifies the error tolerance. The algorithm ensures that after each push, $p + r$ remains an exact PPR vector for $v_0$ , and when the loop terminates, $p$ approximates $\pi$ with error $\epsilon|N(u)|$ per node. The constructed subgraph $\hat{G}$ contains only visited edges—providing a “local grounding” for $Q$ .

5. Theoretical Properties

The Andersen–Chung–Lang theorem asserts that if $u_1, u_2, \ldots$ are the nodes successively pushed in PageRank-Nibble-Prove, then:

$\sum_{i} |N(u_i)| < \frac{1}{\alpha' \epsilon}$

Hence, the number of edges in $\hat{G}$ is at most $1/(\alpha' \epsilon)$ . Both inference time and grounding size are thus $O(1/(\alpha' \epsilon))$ , independent of the database size $|DB|$ or the full proof graph's size. This establishes rigorous scalability guarantees.

6. Weight Learning and Parallelization

Supervised learning is supported using triples $(Q^k, P^k, N^k)$ , where $P^k$ and $N^k$ are the sets of correct and incorrect answers for $Q^k$ . After running PageRank-Nibble-Prove to obtain $(p^k, \hat{G}^k)$ , pairwise learning examples are collected to impose $p^k[u_+] \geq p^k[u_-]$ for all $(u_+,u_-)$ pairs.

The pairwise squared-hinge loss is:

$\ell(v_0, u_+, u_-) = \max(0, p[u_-] - p[u_+])^2$

with total objective

$L(w) = \sum_k \sum_{u_+ \in P^k} \sum_{u_- \in N^k} \ell(v_0^k, u_+, u_-) + \mu \|w\|^2$

using $L_2$ regularization with parameter $\mu$ . Gradients w.r.t. $w$ are computed by backpropagating through power-iteration, in the style of Backstrom & Leskovec. Stochastic gradient descent is applied, with learning rate $\beta = \eta/\text{epoch}^2$ .

Parallelization is realized by running independent threads over separate queries $Q^k$ , grounding and updating asynchronously ("Hogwild!" style). Since each local grounding is small, the per-thread computational cost is low, and wall-clock speedup is nearly linear in thread count.

7. Empirical Results and Comparison

On the CORA citation entity resolution task (1,295 citations, 132 ground-truth papers), queries assess the predicate $\text{samebib}(BC1,BC2)$ . The applied ProPPR program employs approximately 14 clauses over four predicates (author, title, venue, transitive closures) with feature annotations.

Performance metrics include:

Mean average precision (MAP): Untrained ProPPR achieves $\approx0.54$ compared to MLN's $\approx0.53$ , with ProPPR demonstrating roughly 8× faster inference.
After learning, AUCs for matching cite/author/venue/title attributes improve from $\{\approx0.68,0.84,0.86,0.91\}$ (untrained) to $\{\approx0.80,0.84,0.87,0.90\}$ , outperforming MLN’s range of $\{0.52–0.63\}$ .
Inference time for ProPPR remains essentially constant as $|DB|$ increases, while MLN inference time increases substantially.
Learning scales nearly linearly with the number of threads; up to $\approx14$ – $15\times$ speedups are observed with 16 cores.

All aspects of QG-PPR—graph construction, inference, and learning—are query-guided, ensuring the computation remains focused on those portions of the logic program and database relevant to $Q$ , with a strong theoretical guarantee that the resulting computational cost is independent of database size (Wang et al., 2013).

Markdown Report Issue Upgrade to Chat

References (1)

Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic Logic (2013)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Question-Guided Personalized PageRank (QG-PPR).

QG-PPR: Personalized PageRank for Logic

1. Formal Foundations and Semantics

2. Query-Induced Grounding Graph Construction

3. Personalized PageRank on Proof Graphs

4. Local Inference via PageRank-Nibble-Prove

5. Theoretical Properties

6. Weight Learning and Parallelization

7. Empirical Results and Comparison

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

QG-PPR: Personalized PageRank for Logic

1. Formal Foundations and Semantics

2. Query-Induced Grounding Graph Construction

3. Personalized PageRank on Proof Graphs

4. Local Inference via PageRank-Nibble-Prove

5. Theoretical Properties

6. Weight Learning and Parallelization

7. Empirical Results and Comparison

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research