FastSample: Accelerating Distributed Graph Neural Network Training for Billion-Scale Graphs (2311.17847v1)
Abstract: Training Graph Neural Networks(GNNs) on a large monolithic graph presents unique challenges as the graph cannot fit within a single machine and it cannot be decomposed into smaller disconnected components. Distributed sampling-based training distributes the graph across multiple machines and trains the GNN on small parts of the graph that are randomly sampled every training iteration. We show that in a distributed environment, the sampling overhead is a significant component of the training time for large-scale graphs. We propose FastSample which is composed of two synergistic techniques that greatly reduce the distributed sampling time: 1)a new graph partitioning method that eliminates most of the communication rounds in distributed sampling , 2)a novel highly optimized sampling kernel that reduces memory movement during sampling. We test FastSample on large-scale graph benchmarks and show that FastSample speeds up distributed sampling-based GNN training by up to 2x with no loss in accuracy.
- Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 974–983, 2018.
- Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, 2008.
- The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysts. ACS Catalysis, 13(5):3066–3084, 2023.
- Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
- Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
- Distdgl: distributed graph neural network training for billion-scale graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3), pages 36–44. IEEE, 2020.
- Distributed hybrid cpu and gpu training for graph neural networks on billion-scale heterogeneous graphs. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4582–4591, 2022.
- Hesham Mostafa. Sequential aggregation and rematerialization: Distributed full-batch training of graph neural networks on large graphs. Proceedings of Machine Learning and Systems, 4:265–275, 2022.
- Metis: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. 1997.
- Reducing communication in graph neural network training. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–14. IEEE, 2020.
- Distgnn: Scalable distributed training for large-scale graph neural networks. arXiv preprin tarXiv:2104.06700, 2021.
- Scalable and efficient full-graph gnn training for large graphs. Proceedings of the ACM on Management of Data, 1(2):1–23, 2023.
- Wholegraph: a fast graph neural network training framework with multi-gpu distributed shared memory architecture. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–14. IEEE, 2022.
- Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931, 2019.
- Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 257–266, 2019.
- Efficient neighbor-sampling-based gnn training on cpu-fpga heterogeneous platform. In 2021 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–7. IEEE, 2021.
- Sampling methods for efficient training of graph convolutional networks: A survey. IEEE/CAA Journal of Automatica Sinica, 9(2):205–234, 2021.
- Deep graph library: Towards efficient and scalable deep learning on graphs. 2019.
- Fast graph representation learning with pytorch geometric. arxiv 2019. arXiv preprint arXiv:1903.02428, 1903.
- Quiver: Supporting gpus for low-latency, high-throughput gnn serving with workload awareness. arXiv preprint arXiv:2305.10863, 2023.
- Aligraph: A comprehensive graph neural network platform. arXiv preprint arXiv:1902.08730, 2019.
- Large graph convolutional network training with gpu-oriented data communication architecture. arXiv preprint arXiv:2103.03330, 2021.
- Neugraph: parallel deep neural network computation on large graphs. In 2019 {normal-{\{{USENIX}normal-}\}} Annual Technical Conference ({normal-{\{{USENIX}normal-}\}}{normal-{\{{ATC}normal-}\}} 19), pages 443–458, 2019.
- Accelerating training and inference of graph neural networks with fast sampling and pipelining. Proceedings of Machine Learning and Systems, 4:172–189, 2022.
- P3: Distributed deep graph learning at scale. In 15th {normal-{\{{USENIX}normal-}\}} Symposium on Operating Systems Design and Implementation ({normal-{\{{OSDI}normal-}\}} 21), pages 551–568, 2021.
- Global neighbor sampling for mixed cpu-gpu training on giant graphs. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 289–299, 2021.
- Gcn meets gpu: Decoupling “when to sample” from “how to sample”. Advances in Neural Information Processing Systems, 33:18482–18492, 2020.
- Layer-dependent importance sampling for training deep and large graph convolutional networks. Advances in neural information processing systems, 32, 2019.
- Ogb-lsc: A large-scale challenge for machine learning on graphs. arXiv preprint arXiv:2103.09430, 2021.
- Igb: Addressing the gaps in labeling, features, heterogeneity, and size of public graph datasets for deep learning research. arXiv preprint arXiv:2302.13522, 2023.
- torch_ccl. https://github.com/intel/torch-ccl. Accessed: 2021-10-5.
- Oneccl. https://github.com/oneapi-src/oneCCL. Accessed: 2021-10-5.
- Hesham Mostafa (26 papers)
- Adam Grabowski (2 papers)
- Md Asadullah Turja (4 papers)
- Juan Cervino (16 papers)
- Alejandro Ribeiro (281 papers)
- Nageen Himayat (24 papers)