Papers
Topics
Authors
Recent
Search
2000 character limit reached

Helios: An Efficient Out-of-core GNN Training System on Terabyte-scale Graphs with In-memory Performance

Published 2 Oct 2023 in cs.DC | (2310.00837v1)

Abstract: Training graph neural networks (GNNs) on large-scale graph data holds immense promise for numerous real-world applications but remains a great challenge. Several disk-based GNN systems have been built to train large-scale graphs in a single machine. However, they often fall short in terms of performance, especially when training on terabyte-scale graphs. This is because existing disk-based systems either overly focus on minimizing the number of SSD accesses or do not fully overlap SSD accesses with GNN training, thus resulting in substantial unnecessary overhead on the CPU side and then low GPU utilization. To this end, we propose Helios, a system that can train GNN on terabyte graphs in a single machine while achieving throughput comparable with in-memory systems. To achieve this, we first present a GPU-initiated asynchronous disk IO stack, allowing the GPU to directly access graph data on SSD. This design only requires about 30% GPU cores to reach the almost maximal disk IO throughput and wastes no GPU cores between IO submission and IO completion such that the majority of GPU cores are left for other GNN kernels. Second, we design a GPU-managed heterogeneous cache that extends the cache hierarchy to heterogeneous CPU and GPU memory and thus enhances cache lookup throughput significantly by GPU parallelism. Finally, we build a deep GNN-aware pipeline that seamlessly integrates the computation and communication phases of the entire GNN training process, maximizing the utility of GPU computation cycles. Experimental results demonstrate that Helios can match the training throughput of in-memory GNN systems, even for terabyte-scale graphs. Remarkably, Helios surpasses the state-of-the-art GPU-managed baselines by up to 6.43x and exceeds CPU-managed baselines by over 182x on all terabyte-scale graphs.

Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.