GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs (2407.15452v1)
Abstract: Graph Neural Networks (GNNs) have emerged as powerful tools for supervised machine learning over graph-structured data, while sampling-based node representation learning is widely utilized in unsupervised learning. However, scalability remains a major challenge in both supervised and unsupervised learning for large graphs (e.g., those with over 1 billion nodes). The scalability bottleneck largely stems from the mini-batch sampling phase in GNNs and the random walk sampling phase in unsupervised methods. These processes often require storing features or embeddings in memory. In the context of distributed training, they require frequent, inefficient random access to data stored across different workers. Such repeated inter-worker communication for each mini-batch leads to high communication overhead and computational inefficiency. We propose GraphScale, a unified framework for both supervised and unsupervised learning to store and process large graph data distributedly. The key insight in our design is the separation of workers who store data and those who perform the training. This separation allows us to decouple computing and storage in graph training, thus effectively building a pipeline where data fetching and data computation can overlap asynchronously. Our experiments show that GraphScale outperforms state-of-the-art methods for distributed training of both GNNs and node embeddings. We evaluate GraphScale both on public and proprietary graph datasets and observe a reduction of at least 40% in end-to-end training times compared to popular distributed frameworks, without any loss in performance. While most existing methods don't support billion-node graphs for training node embeddings, GraphScale is currently deployed in production at TikTok enabling efficient learning over such large graphs.
- DGCL: an efficient communication library for distributed GNN training. In Proceedings of the Sixteenth European Conference on Computer Systems. 130–144.
- Learning Community Embedding with Community Detection and Node Embedding on Graphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (Singapore, Singapore) (CIKM ’17). Association for Computing Machinery, New York, NY, USA, 377–386. https://doi.org/10.1145/3132847.3132925
- Sudhanshu Chanpuriya and Cameron Musco. 2020. InfiniteWalk: Deep Network Embeddings as Laplacian Embeddings with a Nonlinearity. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020). https://doi.org/10.1145/3394486.3403185
- One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804–1815.
- Theoretically efficient parallel graph algorithms can be fast and scalable. ACM Transactions on Parallel Computing (TOPC) 8, 1 (2021), 1–70.
- Global neighbor sampling for mixed CPU-GPU training on giant graphs. arXiv preprint arXiv:2106.06150 (2021).
- Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019).
- Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In 15th {{\{{USENIX}}\}} Symposium on Operating Systems Design and Implementation ({{\{{OSDI}}\}} 21). 551–568.
- Utilizing graph machine learning within drug discovery and development. Briefings in bioinformatics 22, 6 (2021), bbab159.
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
- Training recommender systems at scale: Communication-efficient model and data parallelism. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2928–2936.
- Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
- Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017).
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 (2020), 22118–22133.
- Featgraph: A flexible and efficient backend for graph neural network systems. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–13.
- A Broader Picture of Random-Walk Based Graph Embedding. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 685–695. https://doi.org/10.1145/3447548.3467300
- Accelerating graph sampling for graph machine learning using GPUs. In Proceedings of the Sixteenth European Conference on Computer Systems. 311–326.
- Accelerating training and inference of graph neural networks with fast sampling and pipelining. Proceedings of Machine Learning and Systems 4 (2022), 172–189.
- George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359–392.
- A comparative study for unsupervised network representation learning. IEEE Transactions on Knowledge and Data Engineering 33, 5 (2019), 1807–1818.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Pytorch-biggraph: A large scale graph embedding system. Proceedings of Machine Learning and Systems 1 (2019), 120–131.
- Pagraph: Scaling gnn training on large graphs via computation-aware caching. In Proceedings of the 11th ACM Symposium on Cloud Computing. 401–415.
- Marius: Learning massive graph embeddings on a single machine. In 15th {{\{{USENIX}}\}} Symposium on Operating Systems Design and Implementation ({{\{{OSDI}}\}} 21). 533–549.
- Ray: A distributed framework for emerging {{\{{AI}}\}} applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 561–577.
- Sancus: staleness-aware communication-avoiding full-graph decentralized training in large-scale graph neural networks. Proceedings of the VLDB Endowment 15, 9 (2022), 1937–1950.
- Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.
- Ning Qian. 1999. On the momentum term in gradient descent learning algorithms. Neural networks 12, 1 (1999), 145–151.
- Lightne: A lightweight graph processing system for network embedding. In Proceedings of the 2021 international conference on management of data. 2281–2289.
- Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (2017). https://doi.org/10.1145/3159652.3159706
- Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Advances in neural information processing systems 24 (2011).
- Sparsified SGD with Memory. ArXiv abs/1809.07599 (2018).
- LINE: Large-Scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web (Florence, Italy) (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1067–1077. https://doi.org/10.1145/2736277.2741093
- Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 839–848.
- Max Welling and Thomas N Kipf. 2016. Semi-supervised classification with graph convolutional networks. In J. International Conference on Learning Representations (ICLR 2017).
- Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014).
- Hongxia Yang. 2019. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining. 3165–3166.
- Tt-rec: Tensor train compression for deep learning recommendation models. Proceedings of Machine Learning and Systems 3 (2021), 448–462.
- Nimble GNN Embedding with Tensor-Train Decomposition. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2327–2335.
- Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. 974–983.
- Shen-Yi Zhao and Wu-Jun Li. 2016. Fast asynchronous parallel stochastic gradient descent: A lock-free approach with convergence guarantee. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
- ByteGNN: efficient graph neural network training at large scale. Proceedings of the VLDB Endowment 15, 6 (2022), 1228–1242.
- DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). 36–44. https://doi.org/10.1109/IA351965.2020.00011
- Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81.
- Marinka Zitnik and Jure Leskovec. 2017. Predicting multicellular function through multi-layer tissue networks. Bioinformatics 33, 14 (2017), i190–i198.