Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Efficient Large-Scale Graph Neural Network Computing (1810.08403v1)

Published 19 Oct 2018 in cs.DC and cs.LG

Abstract: Recent deep learning models have moved beyond low-dimensional regular grids such as image, video, and speech, to high-dimensional graph-structured data, such as social networks, brain connections, and knowledge graphs. This evolution has led to large graph-based irregular and sparse models that go beyond what existing deep learning frameworks are designed for. Further, these models are not easily amenable to efficient, at scale, acceleration on parallel hardwares (e.g. GPUs). We introduce NGra, the first parallel processing framework for graph-based deep neural networks (GNNs). NGra presents a new SAGA-NN model for expressing deep neural networks as vertex programs with each layer in well-defined (Scatter, ApplyEdge, Gather, ApplyVertex) graph operation stages. This model not only allows GNNs to be expressed intuitively, but also facilitates the mapping to an efficient dataflow representation. NGra addresses the scalability challenge transparently through automatic graph partitioning and chunk-based stream processing out of GPU core or over multiple GPUs, which carefully considers data locality, data movement, and overlapping of parallel processing and data movement. NGra further achieves efficiency through highly optimized Scatter/Gather operators on GPUs despite its sparsity. Our evaluation shows that NGra scales to large real graphs that none of the existing frameworks can handle directly, while achieving up to about 4 times speedup even at small scales over the multiple-baseline design on TensorFlow.

Citations (15)

Summary

  • The paper introduces NGra, a framework that integrates the SAGA-NN model for efficient parallel processing of large-scale graph neural networks.
  • It employs automatic graph partitioning, chunk-based stream processing, and a ring-based GPU streaming mechanism to overcome memory limitations.
  • Performance results show up to 6.3× speed improvements over existing sparse execution methods, enabling scalable training on real-world graph data.

Towards Efficient Large-Scale Graph Neural Network Computing

The paper "Towards Efficient Large-Scale Graph Neural Network Computing" introduces NGra, a novel computational framework designed to facilitate efficient parallel processing of Graph Neural Networks (GNNs) at scale. The research addresses the limitations of existing deep learning frameworks, such as TensorFlow and PyTorch, which are not inherently equipped to handle the irregular and sparse data structures presented by graph-based models on parallel hardware like GPUs.

Introduction to NGra and SAGA-NN Model

The core contribution of this work is the NGra system, which innovatively integrates the Graph Neural Network computation within a parallel framework tailored for large-scale graphs. At the heart of NGra is the SAGA-NN programming model - an enhancement of the conventional GAS model - enabling users to seamlessly express GNN operations as vertex programs divided into well-defined stages: Scatter, ApplyEdge, Gather, and ApplyVertex. This model facilitates a simplified mapping of GNN operations to a dataflow representation, which is crucial for enhancing computational efficiency and scalability.

Technical Innovation and GPU Utilization

NGra leverages automatic graph partitioning and chunk-based stream processing, overcoming the constraints of GPU memory by supporting computations across multiple GPUs. It employs sophisticated scheduling strategies to minimize the frequent data movement between host and GPU memory, crucially optimizing the Scatter/Gather operations for sparse matrices, which are notably challenging on data-parallel architectures.

The system's efficacy is bolstered by a unique ring-based streaming mechanism allowing efficient data exchange across multiple GPUs, bypassing bottlenecks typically arising from excessive data movement through shared PCIe links in multi-GPU setups. NGra thus achieves significant performance improvements over traditional methods, achieving up to fourfold speedup over TensorFlow on smaller scales.

Performance Evaluation and Results

Extensive evaluation on various popular GNN architectures (e.g., GCN, GG-NN) and datasets demonstrates NGra’s capacity to handle graphs with millions of vertices and edges efficiently. In tests, NGra showed superior performance in scenarios where traditional frameworks fall short, especially where constraints of GPU memory and irregular data structures are significant barriers. Compared with existing sparse execution frameworks like cuSPARSE, NGra’s tailored optimizations yield performance improvements by factors of up to 6.3.

Implications and Future Directions

Practically, NGra propels the ability to perform GNN training on expansive datasets typical of real-world applications, such as social networks or bioinformatics graphs, which require substantial computational resources. Theoretically, its novel integration of dataflow paradigms within graph-centric computations may inspire further research into hybrid models accommodating diverse data types and processing needs.

Future developments in NGra might focus on expanding its capabilities with enhanced multi-node support, exploring integration with emerging AI hardware accelerators beyond GPUs, and incorporating automated tuning for optimal resource allocation. Advancements in these domains could facilitate more widespread GNN applications, pushing the boundaries of what can be achieved with deep learning on graph-structured data.

Youtube Logo Streamline Icon: https://streamlinehq.com