RSC: Accelerating Graph Neural Networks Training via Randomized Sparse Computations (2210.10737v2)
Abstract: The training of graph neural networks (GNNs) is extremely time consuming because sparse graph-based operations are hard to be accelerated by hardware. Prior art explores trading off the computational precision to reduce the time complexity via sampling-based approximation. Based on the idea, previous works successfully accelerate the dense matrix based operations (e.g., convolution and linear) with negligible accuracy drop. However, unlike dense matrices, sparse matrices are stored in the irregular data format such that each row/column may have different number of non-zero entries. Thus, compared to the dense counterpart, approximating sparse operations has two unique challenges (1) we cannot directly control the efficiency of approximated sparse operation since the computation is only executed on non-zero entries; (2) sub-sampling sparse matrices is much more inefficient due to the irregular data format. To address the issues, our key idea is to control the accuracy-efficiency trade off by optimizing computation resource allocation layer-wisely and epoch-wisely. Specifically, for the first challenge, we customize the computation resource to different sparse operations, while limit the total used resource below a certain budget. For the second challenge, we cache previous sampled sparse matrices to reduce the epoch-wise sampling overhead. Finally, we propose a switching mechanisms to improve the generalization of GNNs trained with approximated operations. To this end, we propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations. In practice, rsc can achieve up to $11.6\times$ speedup for a single sparse operation and a $1.6\times$ end-to-end wall-clock time speedup with negligible accuracy drop.
- Faster neural network training with approximate tensor operations. Advances in Neural Information Processing Systems, 34:27877–27889, 2021.
- Graphnorm: A principled approach to accelerating graph neural network training. In International Conference on Machine Learning, pp. 1204–1215. PMLR, 2021.
- Pixelated butterfly: Simple and efficient sparse training for neural network models. arXiv preprint arXiv:2112.00029, 2021.
- Stochastic training of graph convolutional networks with variance reduction. In International conference on machine learning. PMLR, 2017.
- Simple and deep graph convolutional networks. In International Conference on Machine Learning, pp. 1725–1735. PMLR, 2020.
- Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257–266, 2019.
- Approximating matrix multiplication for pattern recognition tasks. Journal of Algorithms, 30(2):211–252, 1999.
- Minimal variance sampling with provable guarantees for fast training of graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1393–1403, 2020.
- Monarch: Expressive structured matrices for efficient and accurate training. In International Conference on Machine Learning, pp. 4690–4721. PMLR, 2022.
- Fast monte-carlo algorithms for approximate matrix multiplication. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pp. 452–459. IEEE, 2001.
- Fast monte carlo algorithms for matrices i: Approximating matrix multiplication. SIAM Journal on Computing, 36(1):132–157, 2006a.
- Fast monte carlo algorithms for matrices i: Approximating matrix multiplication. SIAM Journal on Computing, 36(1):132–157, 2006b.
- A comprehensive study on large-scale graph training: Benchmarking and rethinking. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022a. URL https://openreview.net/forum?id=2QrFr_U782Z.
- A comprehensive study on large-scale graph training: Benchmarking and rethinking. 2022b.
- Graph random neural networks for semi-supervised learning on graphs. Advances in neural information processing systems, 33:22092–22103, 2020.
- Grand+: Scalable graph random neural networks. In Proceedings of the ACM Web Conference 2022, pp. 3248–3258, 2022.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. In International conference on machine learning, 2021.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. JMLR Workshop and Conference Proceedings, 2010.
- Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035, 2017.
- Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
- Eie: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44(3):243–254, 2016.
- G-mixup: Graph data augmentation for graph classification. arXiv preprint arXiv:2202.07179, 2022.
- Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687, 2020.
- Ge-spmm: General-purpose sparse matrix-matrix multiplication on gpus for graph neural networks. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE, 2020a.
- Combining label propagation and simple models out-performs graph neural networks. In International Conference on Learning Representations, 2020b.
- Adaptive sampling towards fast graph representation learning. In Advances in Neural Information Processing Systems, 2018.
- Fmp: Toward fair graph message passing against topology bias. arXiv preprint arXiv:2202.04187, 2022.
- Graph structure learning for robust graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 66–74, 2020.
- Accelerating training and inference of graph neural networks with fast sampling and pipelining. Proceedings of Machine Learning and Systems, 4:172–189, 2022.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
- Predict then propagate: Graph neural networks meet personalized pagerank. In International Conference on Learning Representations, 2018.
- Towards explaining the regularization effect of initial large learning rate in training neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Divaug: Plug-in automated data augmentation with explicit diversity maximization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4762–4770, 2021.
- Randomized numerical linear algebra: foundations & algorithms (2020). arXiv preprint arXiv:2002.01387, 2020.
- Distgnn: Scalable distributed training for large-scale graph neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14, 2021.
- Iglu: Efficient GCN training via lazy updates. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=5kq11Tl1z4.
- Lightne: A lightweight graph processing system for network embedding. In Proceedings of the 2021 international conference on management of data, pp. 2281–2289, 2021.
- Fusedmm: A unified sddmm-spmm kernel for graph embedding and graph neural networks. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 256–266. IEEE, 2021.
- Learn locally, correct globally: A distributed algorithm for training graph neural networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=FndDxSz3LxQ.
- Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903, 2019.
- Clustered low rank approximation of graphs in information science applications. In Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 164–175. SIAM, 2011.
- Graph attention networks. In International Conference on Learning Representations, 2017.
- {BDS}-{gcn}: Efficient full-graph training of graph convolutional nets with partition-parallelism and boundary sampling, 2021. URL https://openreview.net/forum?id=uFA24r7v4wL.
- Bns-gcn: Efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proceedings of Machine Learning and Systems, 4:673–693, 2022a.
- Pipegcn: Efficient full-graph training of graph convolutional networks with pipelined feature communication. arXiv preprint arXiv:2203.10428, 2022b.
- Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.
- Tc-gnn: Accelerating sparse graph neural network computation via dense tensor core on gpus. arXiv preprint arXiv:2112.02052, 2021.
- Cupcake: Acompression optimizer for scalable communication-efficient distributed training.
- Dragonn: Distributed randomized approximate gradients of neural networks. In International Conference on Machine Learning, pp. 23274–23291. PMLR, 2022.
- Simplifying graph convolutional networks. In International conference on machine learning, pp. 6861–6871. PMLR, 2019.
- Optimization of graph neural networks: Implicit acceleration by skip connections and more depth. In International Conference on Machine Learning, pp. 11592–11602. PMLR, 2021.
- Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983, 2018.
- Scalable graph neural networks for heterogeneous graphs. arXiv preprint arXiv:2011.09679, 2020.
- Distributed learning of fully connected neural networks using independent subnet training. Proc. VLDB Endow., 15(8):1581–1590, 2022. URL https://www.vldb.org/pvldb/vol15/p1581-wolfe.pdf.
- Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BJe8pkHFwS.
- Dreamshard: Generalizable embedding table placement for recommender systems. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=_atSgd9Np52.
- Pre-train and search: Efficient embedding table sharding with pre-trained neural cost models. CoRR, abs/2305.01868, 2023. doi: 10.48550/arXiv.2305.01868. URL https://doi.org/10.48550/arXiv.2305.01868.
- Understanding gnn computational graph: A coordinated computation, io, and memory perspective. Proceedings of Machine Learning and Systems, 4:467–484, 2022.
- Distdgl: distributed graph neural network training for billion-scale graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3), pp. 36–44. IEEE, 2020.
- Revisit kernel pruning with lottery regulated grouped convolutions. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=LdEhiMG9WLO.
- Table2graph: Transforming tabular data to unified weighted graph. In Raedt, L. D. (ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pp. 2420–2426. ijcai.org, 2022. doi: 10.24963/ijcai.2022/336. URL https://doi.org/10.24963/ijcai.2022/336.
- Adaptive label smoothing to regularize large-scale graph training. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp. 55–63. SIAM, 2023.
- Layer-dependent importance sampling for training deep and large graph convolutional networks. arXiv preprint arXiv:1911.07323, 2019.