Scaling Graph Neural Networks with Approximate PageRank (2007.01570v2)

Published 3 Jul 2020 in cs.LG, cs.SI, and stat.ML

Abstract: Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge - many recently proposed scalable GNN approaches rely on an expensive message-passing procedure to propagate information through the graph. We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs resulting in significant speed gains while maintaining state-of-the-art prediction performance. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We demonstrate that PPRGo outperforms baselines in both distributed and single-machine training environments on a number of commonly used academic graphs. To better analyze the scalability of large-scale graph learning methods, we introduce a novel benchmark graph with 12.4 million nodes, 173 million edges, and 2.8 million node features. We show that training PPRGo from scratch and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph. We discuss the practical application of PPRGo to solve large-scale node classification problems at Google.

Citations (345)

View on Semantic Scholar

Summary

The paper introduces a novel GNN architecture, PPRGo, that precomputes approximate PageRank vectors to bypass iterative message-passing.
It reduces computational demands by focusing on top-k influential nodes, achieving competitive accuracy with rapid training times.
PPRGo demonstrates efficient distributed training and sparse inference, processing millions of nodes in minutes.

Large-Scale Graph Neural Networks with Efficient Information Diffusion: An Analysis of the PPRGo Model

Graph neural networks (GNNs) have become instrumental in handling a wide variety of network mining tasks, leveraging their ability to model complex relationships and dependencies in data structured as graphs. Despite their growing popularity, the application of GNNs to large-scale graphs remains challenging due to the computational burden associated with the recursive neighborhood expansions inherent in message-passing procedures. Many contemporary strategies addressing the scalability of GNNs focus on single-machine scenarios, leaving the broader challenges of distributed computation across larger datasets less explored.

The paper introduces the PPRGo model, a GNN architecture designed for efficiency and scalability, which utilizes an approximate personalized PageRank (PPR) to achieve rapid prediction performance on large graphs. PPRGo diverges from traditional GNNs by precomputing the PPR vectors, thus bypassing the need for iterative message-passing during training. This approach not only expedites the training process but also enhances scalability across distributed systems.

The core innovation of PPRGo lies in its reliance on pre-computed sparse approximations of personalized PageRank vectors to facilitate efficient information diffusion. By focusing on a limited subset of highly influential nodes, PPRGo maintains prediction accuracy with reduced computational demand. This is crucial in industrial applications, where node classification tasks often involve graphs with millions or billions of nodes.

Key Findings and Performance Metrics

PPRGo exhibits significant performance advantages in terms of runtime and memory usage on large-scale graphs. In experiments conducted on the MAG-Scholar dataset (12.4 million nodes, 173 million edges), PPRGo demonstrates competitive accuracy while delivering substantial speed benefits compared to baseline GNN models. The model's efficiency is particularly evident in single-machine and distributed training environments, where it capitalizes on parallelizable computations enabled by sparse PPR approximations.

One of the standout features of PPRGo is its ability to process and train on the MAG-Scholar graph within two minutes on a single machine. This rapid computation is achieved without sacrificing predictive performance, highlighting the efficacy of decoupling feature transformation from graph propagation.

Methodological Contributions

The paper introduces several methodological advancements that contribute to the broader field of scalable machine learning on graphs:

Scalable Information Propagation: By approximating the PageRank matrix using the top-k influential nodes, PPRGo efficiently handles the recursive neighborhood expansion bottleneck typical of message-passing GNNs, achieving scalability without significant accuracy degradation.
Distributed Training Implementation: The model utilizes a distributed data processing pipeline akin to MapReduce, parallelizing the computation of PPR vectors across multiple machines, thereby enhancing scalability for extremely large datasets.
Sparse Inference Approach: PPRGo's inference scheme leverages sparse predictions to further speed up computation, effectively surrogate labeling based on graph homophily to reduce the need for computation-intensive inference steps.

Practical Implications and Future Perspectives

PPRGo's design and performance metrics mark a significant step towards applying GNNs in real-world industrial contexts, particularly in settings that demand scalable solutions for rapid inference on evolving large-scale graphs.

Looking forward, the extension of PPRGo to handle dynamic graphs and streaming data represents a promising direction for future research. Additionally, further optimizations in personalized PageRank approximation techniques could unlock even greater efficiencies, enabling the application of GNNs to an expanded array of domains. The challenges of balancing model complexity with computational demands continue to drive research in large-scale GNN design, with PPRGo adding a substantial contribution to this ongoing dialogue.

PDF Markdown