Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs (2010.05337v3)

Published 11 Oct 2020 in cs.LG and cs.DC

Abstract: Graph neural networks (GNN) have shown great success in learning from graph-structured data. They are widely used in various applications, such as recommendation, fraud detection, and search. In these domains, the graphs are typically large, containing hundreds of millions of nodes and several billions of edges. To tackle this challenge, we develop DistDGL, a system for training GNNs in a mini-batch fashion on a cluster of machines. DistDGL is based on the Deep Graph Library (DGL), a popular GNN development framework. DistDGL distributes the graph and its associated data (initial features and embeddings) across the machines and uses this distribution to derive a computational decomposition by following an owner-compute rule. DistDGL follows a synchronous training approach and allows ego-networks forming the mini-batches to include non-local nodes. To minimize the overheads associated with distributed computations, DistDGL uses a high-quality and light-weight min-cut graph partitioning algorithm along with multiple balancing constraints. This allows it to reduce communication overheads and statically balance the computations. It further reduces the communication by replicating halo nodes and by using sparse embedding updates. The combination of these design choices allows DistDGL to train high-quality models while achieving high parallel efficiency and memory scalability. We demonstrate our optimizations on both inductive and transductive GNN models. Our results show that DistDGL achieves linear speedup without compromising model accuracy and requires only 13 seconds to complete a training epoch for a graph with 100 million nodes and 3 billion edges on a cluster with 16 machines. DistDGL is now publicly available as part of DGL:https://github.com/dmlc/dgl/tree/master/python/dgl/distributed.

Citations (220)

Summary

  • The paper introduces a distributed GNN training framework that scales to billion-scale graphs with linear speedup, reducing training to 13 seconds per epoch on 16 machines.
  • It employs novel mini-batch training and graph partitioning techniques, using METIS to minimize inter-machine communication and balance workloads.
  • The framework achieves a 2.2× speedup over previous systems, paving the way for efficient large-scale graph analytics in practical applications.

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

The paper "DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs" introduces a system designed to train Graph Neural Networks (GNNs) on exceedingly large graphs, achieving scalability and efficiency with notable computational performance. DistDGL is built upon the Deep Graph Library (DGL), leveraging its existing framework while introducing distributed training capabilities that address the challenges associated with billion-scale graph data.

Key Innovations and Results

DistDGL tackles the inherent difficulty of training GNNs on massive graphs by distributing both graph data and computations across a cluster of machines. The system implements a mini-batch training fashion, which significantly contrasts with the full-batch approaches typically employed in traditional machine learning domains. This approach accommodates the intertwined nature of graph data, where the dependency among training samples demands novel sampling and partitioning strategies.

Numerical Results:

  1. DistDGL achieves linear speedup without compromising model accuracy. Specifically, an epoch takes merely 13 seconds for a graph with 100 million nodes and 3 billion edges using a 16-machine cluster.
  2. The system demonstrates a 2.2×2.2\times speedup over existing frameworks like Euler on various large graphs.

System Design and Optimizations

The design of DistDGL revolves around several key architectural components to ensure both computational efficiency and balanced workload distribution:

  • Graph Partitioning: Utilizing METIS to minimize edge cuts and improve data locality, DistDGL intelligently partitions graphs to reduce inter-machine communication overhead.
  • Distributed Components: The framework brings together samplers, KVStores, and trainers in a synchronous training environment, optimally co-locating data and computation.
  • Load Balancing and Optimization: Multi-constraint partitioning and other load balancing methods are employed to distribute workload evenly across the cluster, ensuring efficient resource utilization.
  • Efficient Communication: DistDGL employs shared memory for local access and a highly optimized RPC framework to handle network communications, particularly benefitting from fast networking environments.

Implications and Future Directions

Practical Implications: DistDGL emerges as a pivotal tool for domains requiring the analysis of large-scale graph data, such as social networks, recommendation systems, and fraud detection. By facilitating the efficient training of GNNs on large datasets, it enhances the feasibility of applying intricate graph analysis to real-world problems.

Theoretical Implications: From a theoretical perspective, DistDGL provides a robust case for the scalability of GNN models. The system proves that distributed mini-batch training can maintain model accuracy while accommodating the parallel large-scale data processing.

Conclusion

DistDGL represents a significant advancement in distributed GNN training by integrating state-of-the-art partitioning and data co-location strategies, achieving remarkable scalability and speed. It opens up promising pathways for future research and development in scalable AI technologies, which can leverage these methodologies to improve the efficiency of processing extensive datasets. Moving forward, exploring further optimizations in network communication and load balancing could yield even more robust solutions for distributed neural network training scenarios.