Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph Neural Networks for Link Prediction with Subgraph Sketching (2209.15486v3)

Published 30 Sep 2022 in cs.LG and cs.IR

Abstract: Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rather than node) representations and incorporating structural features such as triangle counts. Since explicit link representations are often prohibitively expensive, recent works resorted to subgraph-based methods, which have achieved state-of-the-art performance for LP, but suffer from poor efficiency due to high levels of redundancy between subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link prediction. Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. ELPH is provably more expressive than Message Passing GNNs (MPNNs). It outperforms existing SGNN models on many standard LP benchmarks while being orders of magnitude faster. However, it shares the common GNN limitation that it is only efficient when the dataset fits in GPU memory. Accordingly, we develop a highly scalable model, called BUDDY, which uses feature precomputation to circumvent this limitation without sacrificing predictive performance. Our experiments show that BUDDY also outperforms SGNNs on standard LP benchmarks while being highly scalable and faster than ELPH.

Citations (66)

Summary

  • The paper proposes ELPH and BUDDY frameworks that use subgraph sketches to overcome GNN limitations in link prediction.
  • It employs hashing techniques like HyperLogLog and MinHash to capture key structural features without full subgraph construction.
  • Empirical results show these models outperform state-of-the-art methods in speed, accuracy, and scalability on large graphs.

Link Prediction with Graph Neural Networks and Subgraph Sketching

The paper "Graph Neural Networks for Link Prediction with Subgraph Sketching" addresses the challenges and inefficiencies in existing Graph Neural Networks (GNNs) employed for link prediction tasks. The authors propose innovative methodologies to enhance GNN expressivity and computational efficiency by leveraging subgraph sketches.

Analysis and Challenges in Link Prediction

Link Prediction (LP) in graph structures is a computational task with significant industrial applications, such as recommender systems and knowledge graph construction. The commonly used GNNs, specifically Message Passing Neural Networks (MPNNs), often falter in LP tasks due to inherent expressiveness limitations. These limitations include the inability to distinguish automorphic nodes and to count graph substructures like triangles, which are core to many LP heuristics.

To tackle the inadequacies of MPNNs, the research scrutinizes subgraph-based GNNs (SGNNs), which have shown improved performance due to their ability to encapsulate pertinent subgraph information. However, the computational overhead associated with constructing subgraphs for every potential link makes them inefficient for large-scale applications. The paper identifies key components of SGNNs, notably structural feature encodings, and assesses their contribution to LP performance, uncovering that explicit subgraph construction is not always necessary for effective link prediction.

Introduction to ELPH and BUDDY

In light of these analyses, the paper introduces a novel framework titled ELPH (Efficient Link Prediction with Hashing). Unlike traditional methods relying on full subgraph construction, ELPH utilizes subgraph sketches, successfully abstracting critical structural features from subgraphs as node-wise messages. This approach retains more expressive power by enabling the network to capture intricate local topologies without engendering excessive computational costs.

ELPH’s central innovation lies in its incorporation of hashing techniques, specifically HyperLogLog and MinHash, to approximate key structural components like common neighbors and other subgraph patterns. These sketches afford ELPH superior expressiveness over conventional MPNNs. Its architecture achieves efficiency similar to that of simpler GCN models, while surpassing them in predictive accuracy by solving the automorphic node problem.

In addition, recognizing the scalability limitations of ELPH when dealing with datasets that exceed GPU memory limits, the authors introduce BUDDY. This model preprocesses data to compute hashed node features and employs a straightforward MLP-based link predictor, achieving remarkable scalability and inference speed. With these precomputed features, BUDDY obviates the need for batch subgraph construction, offering a pragmatic approach to large-scale LP challenges.

Empirical Evaluation and Results

The models were evaluated across a range of standard benchmarks, demonstrating superior performance over existing state-of-the-art techniques, notably outclassing models like SEAL and NBFNet in terms of both speed and accuracy. BUDDY, in particular, showcases scalability and efficiency on larger datasets where other methods falter.

The research provides rigorous empirical validation, with results showing the efficacy of these approaches in improving LP tasks. The findings support the claim that efficient and scalable link prediction can be achieved without sacrificing performance, opening avenues for GNN applications in resource-constrained environments or large-scale networks.

Implications and Future Directions

The introduction of ELPH and BUDDY represents an advancement in incorporating expressive yet efficient strategies for LP in graph neural networks. Practically, these methods facilitate the handling of extensive and dense graphs, broadening the applicability of GNN frameworks across various domains requiring link prediction.

Theoretically, the use of subgraph sketches invites further exploration into graph representation learning, particularly in reducing computational complexity while retaining the depth of structural understanding. Future research may delve into extensions for directed and temporal graphs, augmenting the current models with additional dimensions of graph data.

In conclusion, this research presents a substantive contribution to the field of graph machine learning, particularly in the context of optimizing link prediction performance while addressing scalability and expressiveness issues endemic to existing methodologies.