Link Prediction Based on Graph Neural Networks (1802.09691v3)

Published 27 Feb 2018 in cs.LG and stat.ML

Abstract: Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a `heuristic' that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel $\gamma$-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the $\gamma$-decaying theory, we propose a new algorithm to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems.

Citations (1,771)

View on Semantic Scholar

Summary

The paper introduces SEAL, a novel framework that learns link prediction heuristics using local subgraph extraction and Graph Neural Networks.
It develops the γ-decaying heuristic theory, showing that high-order heuristics can be effectively approximated using local subgraphs.
Empirical results on multiple datasets reveal SEAL's significant AUC improvements, underscoring its practical value in diverse network applications.

Link Prediction Based on Graph Neural Networks

The paper "Link Prediction Based on Graph Neural Networks" by Zhang and Chen explores the critical problem of link prediction in network-structured data. Link prediction seeks to ascertain whether a link exists between two nodes in a graph based on various heuristics or learned features. Traditional link prediction heuristics, such as common neighbors and the Katz index, though efficient and interpretable, are constrained by their underlying assumptions and often fail under different network topologies. The authors propose a new paradigm that seeks to learn link prediction heuristics directly from the graph, thereby alleviating the limitations of predefined heuristics.

In particular, the paper introduces a robust heuristic learning framework harnessing local subgraphs around potential links and utilizing Graph Neural Networks (GNNs). The methodology is built around the concept of extracting an enclosing subgraph around each target link, thereby mapping subgraph patterns to link existence, and establishing a learned heuristic that is inherently suited to the specific network.

Theoretical Foundation

A major theoretical contribution of the paper is the development of the $\gamma$ -decaying heuristic theory, which unifies a variety of existing heuristics under a single framework. This theory demonstrates that many popular heuristics, traditionally assumed to require network-wide information, can be approximated effectively using local subgraphs. Specifically, the authors prove that high-order heuristics, such as Katz, rooted PageRank, and SimRank, typically display a $\gamma$ -decaying nature—their influence diminishes exponentially with distance. This implies that a local subgraph can often capture the essential information necessary for their approximation, reducing computational complexity and improving scalability.

SEAL Framework

The empirical centerpiece of the paper is SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction), which implements the aforementioned theoretical insights using GNNs. The framework consists of three principal steps:

Subgraph Extraction: For each target link, SEAL extracts an $h$ -hop enclosing subgraph.
Node Information Matrix Construction: This matrix integrates structural node labels, node embeddings, and node attributes, providing a comprehensive input representation for the GNN.
GNN Learning: The GNN processes the node information matrix and learns discriminative features tailored to the link prediction task.

A notable aspect of SEAL is the Double-Radius Node Labeling (DRNL), which assigns unique labels to nodes based on their distance to the center nodes of the subgraph, enriching the node positional information crucial for accurate predictions.

Experimental Evaluation

Evaluations on diverse datasets (e.g., USAir, NS, PB, Yeast) reveal SEAL's superior performance over traditional heuristic-based methods and recent network embedding techniques. SEAL not only learns effective first and second-order heuristics, but also approximates high-order heuristics, resulting in its robustness across different network types.

The numerical results demonstrate significant AUC improvements (e.g., SEAL achieves an AUC of 98.85% on the NS dataset compared to 97.64% of the best heuristic ensemble). SEAL consistently outperforms other methods in terms of both AUC and average precision (AP).

Implications and Future Developments

The implications of this research are twofold:

Practical Utilization: Practitioners can leverage SEAL for more accurate link prediction in applications like social network analysis, recommendation systems, and bioinformatics. Its ability to integrate explicit and latent features alongside structural information adds versatility.
Theoretical Advancement: The $\gamma$ -decaying heuristic theory paves the way for future research in network representation learning, potentially extending beyond link prediction to other relational learning tasks like community detection and network reconstruction.

SEAL's scalability and efficiency, especially with the included optimization tricks such as "negative injection", point towards its applicability in large-scale networks. Prospective research could explore further optimizations and adapt SEAL for real-time predictions in evolving networks.

In conclusion, the paper presents a substantial advancement in the field of link prediction by combining rigorous theoretical groundwork with practical, high-performing implementations. SEAL exemplifies the integration of deep learning with graph theory, providing a robust method for predicting links in complex networks.

PDF Markdown

Related Papers

GitHub

GitHub - muhanzhang/SEAL: SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction). "M. Zhang, Y. Chen, Link Prediction Based on Graph Neural Networks, NeurIPS 2018 spotlight". (657 stars)