CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks (2404.02300v1)
Abstract: Graph neural networks have been shown successful in recent years. While different GNN architectures and training systems have been developed, GNN training on large-scale real-world graphs still remains challenging. Existing distributed systems load the entire graph in memory for graph partitioning, requiring a huge memory space to process large graphs and thus hindering GNN training on such large graphs using commodity workstations. In this paper, we propose CATGNN, a cost-efficient and scalable distributed GNN training system which focuses on scaling GNN training to billion-scale or larger graphs under limited computational resources. Among other features, it takes a stream of edges as input, instead of loading the entire graph in memory, for partitioning. We also propose a novel streaming partitioning algorithm named SPRING for distributed GNN training. We verify the correctness and effectiveness of CATGNN with SPRING on 16 open datasets. In particular, we demonstrate that CATGNN can handle the largest publicly available dataset with limited memory, which would have been infeasible without increasing the memory space. SPRING also outperforms state-of-the-art partitioning algorithms significantly, with a 50% reduction in replication factor on average.
- 2022. Deep Graph Library Forum. https://discuss.dgl.ai/t/failed-on-writing-metis-and-killed-for-distdgl-on-papers100m/3121.
- 2023. pytorch__\__memlab. https://github.com/Stonesjtu/pytorch_memlab.
- Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of modern physics 74, 1 (2002), 47.
- DGCL: An efficient communication library for distributed GNN training. In Proceedings of the Sixteenth European Conference on Computer Systems. 130–144.
- Cogdl: A comprehensive library for graph deep learning. In Proceedings of the ACM Web Conference 2023. 747–758.
- Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 257–266.
- Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems 29 (2016).
- Matthias Fey and Jan E. Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
- TLPGNN: A lightweight two-level parallelism paradigm for graph neural network computation on GPU. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing. 122–134.
- Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 551–568.
- FedADMM: A robust federated deep learning framework with adaptivity to system heterogeneity. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2575–2587.
- PowerGraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 17–30.
- Ronald L. Graham. 1969. Bounds on multiprocessing timing anomalies. SIAM journal on Applied Mathematics 17, 2 (1969), 416–429.
- Daniele Grattarola and Cesare Alippi. 2021. Graph neural networks in TensorFlow and Keras with spektral [application notes]. IEEE Computational Intelligence Magazine 16, 1 (2021), 99–106.
- Model averaging in distributed machine learning: a case study with Apache Spark. The VLDB Journal 30 (2021), 693–712.
- Inductive representation learning on large graphs. Advances in Neural Information Processing Systems 30 (2017).
- William L Hamilton. 2020. Graph representation learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 14, 3 (2020), 1–159.
- Illuminati: Towards explaining graph neural networks for cybersecurity analysis. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 74–89.
- A streaming algorithm for graph clustering. In NIPS 2017-Workshop on Advances in Modeling and Learning Interactions from Complex Data. 1–12.
- Relation networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3588–3597.
- Efficient graph deep learning in TensorFlow with tf_geometric. In Proceedings of the 29th ACM International Conference on Multimedia. 3775–3778.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems 33 (2020), 22118–22133.
- Characterizing the efficiency of graph neural network frameworks with a magnifying glass. In 2022 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 160–170.
- GraphBuilder: Scalable graph ETL framework. In First international workshop on graph data management experiences and systems. 1–6.
- Improving the accuracy, scalability, and performance of graph neural networks with ROC. Proceedings of Machine Learning and Systems 2 (2020), 187–198.
- A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 463–479.
- Accelerating training and inference of graph neural networks with fast sampling and pipelining. Proceedings of Machine Learning and Systems 4 (2022), 172–189.
- Efficient decentralized deep learning by dynamic model averaging. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18. Springer, 393–409.
- George Karypis and Vipin Kumar. 1997. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. (1997).
- George Karypis and Vipin Kumar. 1998. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel and Distrib. Comput. 48, 1 (1998), 96–129.
- Brian W Kernighan and Shen Lin. 1970. An efficient heuristic procedure for partitioning graphs. The Bell system technical journal 49, 2 (1970), 291–307.
- Janis Keuper and Franz-Josef Preundt. 2016. Distributed training of deep neural networks: Theoretical and practical limits of parallel scalability. In 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC). IEEE, 19–26.
- D. P. Kingma and J. Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, ICLR 2015.
- Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, ICLR 2017.
- Joseph YT Leung. 2004. Handbook of scheduling: Algorithms, models, and performance analysis. CRC press.
- PyTorch distributed: Experiences on accelerating data parallel draining. Proceedings of the VLDB Endowment 13, 12 (2020).
- On the convergence of FedAvg on Non-IID data. In International Conference on Learning Representations, ICLR 2019.
- Pagraph: Scaling GNN training on large graphs via computation-aware caching. In Proceedings of the 11th ACM Symposium on Cloud Computing. 401–415.
- Graph neural network based behavior prediction to support multi-agent reinforcement learning in military training simulations. In 2021 Winter Simulation Conference (WSC). IEEE, 1–12.
- BGL: GPU-Efficient GNN training by optimizing graph data I/O and preprocessing. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 103–118.
- Robert Love. 2013. Linux system programming: Talking directly to the kernel and C library. “O’Reilly Media, Inc.”.
- NeuGraph: Parallel deep neural network computation on large graphs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 443–458.
- Yao Ma and Jiliang Tang. 2021. Deep learning on graphs. Cambridge University Press.
- Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135–146.
- ADWISE: Adaptive window-based streaming edge partitioning for high-speed graph processing. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 685–695.
- Out-of-core edge partitioning at linear run-time. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2629–2642.
- Distributed training strategies for the structured perceptron. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics. 456–464.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.
- DistGNN: Scalable distributed training for large-scale graph neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14.
- PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035.
- Window-based streaming graph partitioning algorithm. In Proceedings of the Australasian Computer Science Week Multiconference. 1–10.
- A generic communication scheduler for distributed DNN training acceleration. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 16–29.
- HDRF: Stream-based partitioning for power-law graphs. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 243–252.
- Evaluating data-parallel distributed training strategies. In 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS). IEEE, 759–763.
- Learning human-object interactions by graph parsing neural networks. In Proceedings of the European Conference on Computer Vision (ECCV). 401–417.
- Learn locally, correct globally: A distributed algorithm for training graph neural networks. In International Conference on Learning Representations, ICLR 2022.
- The ubiquity of large graphs and surprising challenges of graph processing: Extended survey. The VLDB journal 29 (2020), 595–618.
- Alexander Sergeev and Mike Del Balso. 2018. Horovod: Fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).
- Network in graph neural network. arXiv preprint arXiv:2111.11638 (2021).
- Isabelle Stanton. 2014. Streaming balanced graph partitioning algorithms for random graphs. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 1287–1301.
- Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1222–1230.
- Dorylus: Affordable, scalable, and accurate GNN training with distributed CPU servers and serverless threads. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 495–514.
- Reducing communication in graph neural network training. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–14.
- Charalampos Tsourakakis. 2015. Streaming graph partitioning in the planted partition model. In Proceedings of the 2015 ACM on Conference on Online Social Networks. 27–35.
- Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. 333–342.
- Graph Attention Networks. In International Conference on Learning Representations, ICLR 2018.
- Deep Graph Library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019).
- Simplifying graph convolutional networks. In International Conference on Machine Learning. PMLR, 6861–6871.
- Graph neural networks for natural language processing: A survey. Foundations and Trends® in Machine Learning 16, 2 (2023), 119–328.
- Redundancy-free high-performance dynamic GNN training with hierarchical pipeline parallelism. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing. 17–30.
- Distributed power-law graph computing: Theoretical and empirical analysis. Advances in Neural Information Processing Systems 27 (2014).
- How powerful are graph neural networks?. In International Conference on Learning Representations, ICLR 2018.
- WholeGraph: a fast graph neural network training framework with multi-GPU distributed shared memory architecture. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–14.
- Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7370–7377.
- Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 974–983.
- Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5693–5700.
- GraphSAINT: Graph sampling based inductive learning method. In International Conference on Learning Representations, ICLR 2019.
- Graph edge partitioning via neighborhood heuristic. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 605–614.
- AGL: A scalable system for industrial-purpose graph machine learning. Proceedings of the VLDB Endowment 13, 12 (2020).
- 2PGraph: Accelerating GNN training over large graphs on GPU clusters. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 103–113.
- AKIN: A streaming graph partitioning algorithm for distributed graph storage systems. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 183–192.
- ByteGNN: Efficient graph neural network training at large scale. Proceedings of the VLDB Endowment 15, 6 (2022), 1228–1242.
- DistDGL: Distributed graph neural network training for billion-scale graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 36–44.
- Distributed hybrid CPU and GPU training for graph neural networks on billion-scale heterogeneous graphs. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4582–4591.
- Heterogeneous spatio-temporal graph convolution network for traffic forecasting with missing values. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS). IEEE, 707–717.
- DistTGL: Distributed memory-based temporal graph neural network training. In SC23: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12.
- AliGraph: A comprehensive graph neural network platform. Proceedings of the VLDB Endowment 12, 12 (2019), 2094–2105.
- Semi-supervised Learning with Network Embedding on Ambient RF Signals for Geofencing Services. In IEEE ICDE 2023.
- FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals. In IEEE ICDCS 2023.
- Parallelized stochastic gradient descent. Advances in Neural Information Processing Systems 23 (2010).
- Xin Huang (222 papers)
- Weipeng Zhuo (9 papers)
- Minh Phu Vuong (1 paper)
- Shiju Li (3 papers)
- Jongryool Kim (4 papers)
- Bradley Rees (2 papers)
- Chul-Ho Lee (55 papers)