BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large Graphs (2306.13814v1)
Abstract: We present BatchGNN, a distributed CPU system that showcases techniques that can be used to efficiently train GNNs on terabyte-sized graphs. It reduces communication overhead with macrobatching in which multiple minibatches' subgraph sampling and feature fetching are batched into one communication relay to reduce redundant feature fetches when input features are static. BatchGNN provides integrated graph partitioning and native GNN layer implementations to improve runtime, and it can cache aggregated input features to further reduce sampling overhead. BatchGNN achieves an average $3\times$ speedup over DistDGL on three GNN models trained on OGBN graphs, outperforms the runtimes reported by distributed GPU systems $P3$ and DistDGLv2, and scales to a terabyte-sized graph.
- A Graph Neural Network Approach for Product Relationship Prediction (08 2021), vol. Volume 3A: 47th Design Automation Conference (DAC) of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. V03AT03A036.
- Alibaba. Euler, Distributed Graph Deep Learning Framework.
- The LDBC Social Network Benchmark.
- Molecular generative Graph Neural Networks for Drug Discovery. Neurocomputing 450 (2021), 242–252.
- FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations (2018).
- Enhancing Graph Neural Network-Based Fraud Detectors against Camouflaged Fraudsters. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (New York, NY, USA, 2020), CIKM ’20, Association for Computing Machinery, p. 315–324.
- Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds (2019).
- GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings. In Proceedings of the 38th International Conference on Machine Learning (18–24 Jul 2021), M. Meila and T. Zhang, Eds., vol. 139 of Proceedings of Machine Learning Research, PMLR, pp. 3294–3304.
- P3: Distributed Deep Graph Learning at Scale. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) (July 2021), USENIX Association, pp. 551–568.
- Influence-Based Mini-Batching for Graph Neural Networks. In The First Learning on Graphs Conference (2022).
- Inductive Representation Learning on Large Graphs. In NIPS (2017), pp. 1024–1034.
- Efficient Distribution for Deep Learning on Graphs. In First MLSys Workshop on Graph Neural Networks and Systems (GNNSys21) (2021).
- CuSP: A Customizable Streaming Edge Partitioner for Distributed Graph Analytics. In Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium (2019), IPDPS 2019, pp. 439–450.
- Open Graph Benchmark: Datasets for Machine Learning on Graphs. In Advances in Neural Information Processing Systems (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., pp. 22118–22133.
- Intel. Intel oneAPI Math Kernel Library.
- Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc. In Proceedings of Machine Learning and Systems (2020), I. Dhillon, D. Papailiopoulos, and V. Sze, Eds., vol. 2, pp. 187–198.
- Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining. In Proceedings of Machine Learning and Systems (2022), D. Marculescu, Y. Chi, and C. Wu, Eds., vol. 4, pp. 172–189.
- A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20, 1 (Dec. 1998), 359–392.
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, 2016.
- Semi-Supervised Classification with Graph Convolutional Networks. In ICLR (2017).
- NeuGraph: Parallel Deep Neural Network Computation on Large Graphs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) (Renton, WA, July 2019), USENIX Association, pp. 443–458.
- DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (New York, NY, USA, 2021), SC ’21, Association for Computing Machinery.
- A Lightweight Infrastructure for Graph Analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (New York, NY, USA, 2013), SOSP ’13, ACM, pp. 456–471.
- Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a Single Machine via Provably Optimal In-memory Caching. In Proceedings of the VLDB Endowment (2022), vol. 15.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8026–8037.
- Reducing Communication in Graph Neural Network Training. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2020), SC ’20, IEEE Press.
- Graph Attention Networks. In ICLR (2018).
- BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling. In Proceedings of Machine Learning and Systems (2022), D. Marculescu, Y. Chi, and C. Wu, Eds., vol. 4, pp. 673–693.
- Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. In International Conference on Learning Representations (2019), ICLR ’19.
- GraphScope: A One-Stop Large Graph Processing System. Proc. VLDB Endow. 14, 12 (oct 2021), 2703–2706.
- How Powerful are Graph Neural Networks? In International Conference on Learning Representations (2019), ICLR ’19.
- Yang, H. AliGraph: A Comprehensive Graph Neural Network Platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (New York, NY, USA, 2019), KDD ’19, Association for Computing Machinery, pp. 3165–3166.
- Accurate, Efficient and Scalable Graph Embedding. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (May 2019), pp. 462–471.
- ByteGNN: Efficient Graph Neural Network Training at Large Scale. Proc. VLDB Endow. 15, 6 (feb 2022), 1228–1242.
- DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs, 2021.
- Distributed Hybrid CPU and GPU Training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2022), KDD ’22, Association for Computing Machinery, p. 4582–4591.
- Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks. In Advances in Neural Information Processing Systems (2019), H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32, Curran Associates, Inc.
- Loc Hoang (5 papers)
- Rita Brugarolas Brufau (2 papers)
- Ke Ding (30 papers)
- Bo Wu (144 papers)