CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks (2404.02300v1)

Published 2 Apr 2024 in cs.LG and cs.DC

Abstract: Graph neural networks have been shown successful in recent years. While different GNN architectures and training systems have been developed, GNN training on large-scale real-world graphs still remains challenging. Existing distributed systems load the entire graph in memory for graph partitioning, requiring a huge memory space to process large graphs and thus hindering GNN training on such large graphs using commodity workstations. In this paper, we propose CATGNN, a cost-efficient and scalable distributed GNN training system which focuses on scaling GNN training to billion-scale or larger graphs under limited computational resources. Among other features, it takes a stream of edges as input, instead of loading the entire graph in memory, for partitioning. We also propose a novel streaming partitioning algorithm named SPRING for distributed GNN training. We verify the correctness and effectiveness of CATGNN with SPRING on 16 open datasets. In particular, we demonstrate that CATGNN can handle the largest publicly available dataset with limited memory, which would have been infeasible without increasing the memory space. SPRING also outperforms state-of-the-art partitioning algorithms significantly, with a 50% reduction in replication factor on average.

References (90)

Authors (7)

Xin Huang (222 papers)
Weipeng Zhuo (9 papers)
Minh Phu Vuong (1 paper)
Shiju Li (3 papers)
Jongryool Kim (4 papers)
Bradley Rees (2 papers)
Chul-Ho Lee (55 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/HPCPapers/status/1775765876844769419

CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks (2404.02300v1)

Summary

Related Papers

Tweets