Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments (2311.13225v2)

Published 22 Nov 2023 in cs.DC and cs.LG

Abstract: Graph Neural Networks (GNNs) have demonstrated outstanding performance in various applications. Existing frameworks utilize CPU-GPU heterogeneous environments to train GNN models and integrate mini-batch and sampling techniques to overcome the GPU memory limitation. In CPU-GPU heterogeneous environments, we can divide sample-based GNN training into three steps: sample, gather, and train. Existing GNN systems use different task orchestrating methods to employ each step on CPU or GPU. After extensive experiments and analysis, we find that existing task orchestrating methods fail to fully utilize the heterogeneous resources, limited by inefficient CPU processing or GPU resource contention. In this paper, we propose NeutronOrch, a system for sample-based GNN training that incorporates a layer-based task orchestrating method and ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training. To avoid inefficient CPU processing, NeutronOrch only offloads the training of frequently accessed vertices to the CPU and lets GPU reuse their embeddings with bounded staleness. Furthermore, NeutronOrch provides a fine-grained pipeline design for the layer-based task orchestrating method, fully overlapping different tasks on heterogeneous resources while strictly guaranteeing bounded staleness. The experimental results show that compared with the state-of-the-art GNN systems, NeutronOrch can achieve up to 11.51x performance speedup.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’06, Philadelphia, PA, USA. 44–54.
  2. Graph Convolutional Encoders for Syntax-aware Neural Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP’17, Copenhagen, Denmark. Association for Computational Linguistics, 1957–1967.
  3. DSP: Efficient GNN Training with Multiple GPUs. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP’23, Montreal, QC, Canada. ACM, 392–404.
  4. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In 6th International Conference on Learning Representations, ICLR’18, Vancouver, BC, Canada. OpenReview.net.
  5. Stochastic Training of Graph Convolutional Networks with Variance Reduction. In Proceedings of the 35th International Conference on Machine Learning, ICML’18, Stockholmsmässan, Stockholm, Sweden (Proceedings of Machine Learning Research), Vol. 80. PMLR, 941–949.
  6. Graph-based Representation Learning for Web-scale Recommender Systems. In The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD’22, Washington, DC, USA, Aidong Zhang and Huzefa Rangwala (Eds.). ACM, 4784–4785.
  7. Graph Neural Networks for Social Recommendation. In The World Wide Web Conference, WWW’19, San Francisco, CA, USA. ACM, 417–426.
  8. Matthias Fey and Jan Eric Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. CoRR abs/1903.02428 (2019).
  9. GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research), Marina Meila and Tong Zhang (Eds.), Vol. 139. PMLR, 3294–3304.
  10. Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques. In The 49th Annual International Symposium on Computer Architecture, ISCA’22, New York, USA. ACM, 916–931.
  11. Representation Learning on Graphs: Methods and Applications. IEEE Data Eng. Bull. 40, 3 (2017), 52–74.
  12. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, NeurIPS’17 Long Beach, CA, USA. 1024–1034.
  13. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. NeurIPS’12, Lake Tahoe, Nevada, United States. 1223–1231.
  14. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS’20, December 6-12.
  15. ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training. CoRR abs/2301.07482 (2023).
  16. Understanding and bridging the gaps in current GNN performance optimizations. In 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’21, Virtual Event, Republic of Korea. ACM, 119–132.
  17. Understanding and bridging the gaps in current GNN performance optimizations. In 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event, PPoPP ’21, Republic of Korea. ACM, 119–132.
  18. Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale. In International Conference on Management of Data, SIGMOD’22, Philadelphia, PA, USA. ACM, 2259–2272.
  19. Accelerating graph sampling for graph machine learning using GPUs. In Sixteenth European Conference on Computer Systems, EuroSys ’21, Online Event, United Kingdom, Antonio Barbalace, Pramod Bhatotia, Lorenzo Alvisi, and Cristian Cadar (Eds.). ACM, 311–326.
  20. Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc. In Proceedings of Machine Learning and Systems 2020, MLSys’20, Austin, TX, USA, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org.
  21. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR’17, Toulon, France, Conference Track Proceedings. OpenReview.net.
  22. Jérôme Kunegis. 2013. KONECT: the Koblenz network collection. In 22nd International World Wide Web Conference, WWW’13, Rio de Janeiro, Brazil. International World Wide Web Conferences Steering Committee / ACM, 1343–1350.
  23. PaGraph: Scaling GNN training on large graphs via computation-aware caching. In ACM Symposium on Cloud Computing, SoCC’20, Virtual Event, USA. ACM, 401–415.
  24. Mosaic: Processing a Trillion-Edge Graph on a Single Machine. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys’17, Belgrade, Serbia. ACM, 527–543.
  25. Microsoft. 2020. Extreme-scale model training for everyone. https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scalemodel-training-for-everyone.
  26. Graph Neural Network Training and Data Tiering. In The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD’22, Washington, DC, USA. ACM, 3555–3565.
  27. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. Proc. VLDB Endow. 14, 11 (2021), 2087–2100.
  28. Marius: Learning Massive Graph Embeddings on a Single Machine. In 15th USENIX Symposium on Operating Systems Design and Implementation, OSDI’21. USENIX Association, 533–549.
  29. NVIDIA. 2022. DGX Systems. https://www.nvidia.com/en-sg/data-center/dgx-systems/dgx-1.
  30. SANCUS: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks. Proc. VLDB Endow. 15, 9 (2022), 1937–1950.
  31. ZeRO: memory optimizations toward training trillion parameter models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’20, Virtual Event / Atlanta, Georgia, USA. IEEE/ACM, 20.
  32. ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC’21, St. Louis, Missouri, USA. ACM, 59.
  33. GCN meets GPU: Decoupling ”When to Sample” from ”How to Sample”. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS’20, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).
  34. ZeRO-Offload: Democratizing Billion-Scale Model Training. In 2021 USENIX Annual Technical Conference, ATC’21. USENIX Association, 551–564.
  35. Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training. In USENIX Annual Technical Conference, USENIX ATC 2023, Boston, MA, USA, July 10-12, 2023, Julia Lawall and Dan Williams (Eds.). USENIX Association, 165–179.
  36. Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness. CoRR abs/2305.10863 (2023).
  37. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR’18, Vancouver, BC, Canada, Conference Track Proceedings. OpenReview.net.
  38. Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine. CoRR abs/2202.02365 (2022).
  39. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. CoRR abs/1909.01315 (2019).
  40. HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading). CoRR abs/2311.14898 (2023).
  41. Hashing-Accelerated Graph Neural Networks for Link Prediction. In The Web Conference 2021, WWW’21, Virtual Event / Ljubljana, Slovenia. ACM / IW3C2, 2910–2920.
  42. TurboGNN: Improving the End-to-End Performance for Sampling-Based GNN Training on GPUs. IEEE Trans. Comput. (2023).
  43. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 32, 1 (2021), 4–24.
  44. WholeGraph: A Fast Graph Neural Network Training Framework with Multi-GPU Distributed Shared Memory Architecture. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC’22, Dallas, TX, USA. IEEE, 1–14.
  45. Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42, 1 (2015), 181–213.
  46. GNNLab: a factored system for sample-based GNN training over GPUs. In Seventeenth European Conference on Computer Systems, EuroSys ’22, Rennes, France. ACM, 417–434.
  47. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD’18, London, UK. ACM, 974–983.
  48. Muhan Zhang and Yixin Chen. 2018. Link Prediction Based on Graph Neural Networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS’18, Montréal, Canada. 5171–5181.
  49. DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU. Proc. ACM Manag. Data 1, 2 (2023), 166:1–166:24.
  50. Deep Learning on Graphs: A Survey. IEEE Trans. Knowl. Data Eng. 34, 1 (2022), 249–270.
  51. DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. In 10th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, IA3’20, Atlanta, GA, USA. IEEE, 36–44.
  52. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81.
Citations (3)

Summary

We haven't generated a summary for this paper yet.