FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework (2404.08364v3)
Abstract: Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.
- Paolo Boldi and Marco Rosa. 2012. Arc-community detection via triangular random walks. In 2012 Eighth Latin American Web Congress. IEEE, 48–56.
- Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM Press, 587–596.
- Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595–601.
- Min-Te Chao. 1982. A general purpose unequal probability sampling plan. Biometrika 69, 3 (1982), 653–656.
- Thomas H Cormen and Michael T Goodrich. 1996. A bridging model for parallel computation, communication, and I/O. ACM Computing Surveys (CSUR) 28, 4es (1996), 208–es.
- Finding overlapping communities based on Markov chain and link clustering. Peer-to-peer Networking and Applications 10 (2017), 411–420.
- metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 135–144.
- Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics 2, 3 (2005), 333–358.
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
- HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA, 639–648. https://doi.org/10.1145/3357384.3358061
- W. Daniel Hillis and Guy L. Steele. 1986. Data Parallel Algorithms. Commun. ACM 29, 12 (dec 1986), 1170–1183. https://doi.org/10.1145/7902.7903
- Lorenz Hübschle-Schneider and Peter Sanders. 2022. Parallel weighted random sampling. ACM Transactions on Mathematical Software (TOMS) 48, 3 (2022), 1–40.
- Accelerating graph sampling for graph machine learning using GPUs. In Proceedings of the Sixteenth European Conference on Computer Systems. 311–326.
- A High-Performance Graph Engine for Efficient Social Network Analysis. In Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon , France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 61–62. https://doi.org/10.1145/3184558.3186929
- Peter M Kogge and Harold S Stone. 1973. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE transactions on computers 100, 8 (1973), 786–793.
- Aapo Kyrola. 2013a. DrunkardMob: billions of random walks on just a PC. Proceedings of the 7th ACM conference on Recommender systems (2013).
- Aapo Kyrola. 2013b. Drunkardmob: billions of random walks on just a pc. In Proceedings of the 7th ACM conference on Recommender systems. 257–264.
- GraphChi: Large-Scale Graph Computation on Just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8-10, 2012. 31–46. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/kyrola
- Ni Lao and William W Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine learning 81 (2010), 53–67.
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
- Second-Order CoSimRank for Similarity Measures in Social Networks. In ICC 2019 - 2019 IEEE International Conference on Communications (ICC). 1–6. https://doi.org/10.1109/ICC.2019.8761899
- Adapting Meta Knowledge Graph Information for Multi-Hop Reasoning over Few-Shot Relations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3374–3379. https://doi.org/10.18653/v1/D19-1334
- Algorithmic strategies for optimizing the parallel reduction primitive in CUDA. In 2012 International Conference on High Performance Computing & Simulation (HPCS). 511–519. https://doi.org/10.1109/HPCSim.2012.6266966
- NVIDIA. 2022. CUB Documentation. https://nvlabs.github.io/cub/index.html, Last accessed on 2023-6-25.
- NVIDIA. 2023. CUDA Toolkit Documentation, cuRAND. https://docs.nvidia.com/cuda/curand/index.html, Last accessed on 2023-6-25.
- S. Olver and A. Townsend. 2013. Fast inverse transform sampling in one and two dimensions. arXiv: Numerical Analysis (2013).
- C-SAW: A framework for graph sampling and random walk on GPUs. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–15.
- Space-efficient random walks on streaming graphs. Proceedings of the VLDB Endowment 16, 2 (2022), 356–368.
- Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.
- Christian Robert and George Casella. 2013. Monte Carlo statistical methods. In Springer Science & Business Media.
- The Graph Neural Network Model. IEEE Transactions on Neural Networks 20, 1 (2009), 61–80. https://doi.org/10.1109/TNN.2008.2005605
- A work-efficient step-efficient prefix sum algorithm. (2006).
- Memory-aware framework for efficient second-order random walk on large graphs. In Proceedings of the 2020 ACM SIGMOD international conference on management of data. 1797–1812.
- Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. Knowledge-based systems 104 (2016), 123–133.
- ThunderRW: An In-Memory Graph Random Walk Engine. Proc. VLDB Endow. 14, 11 (2021), 1992–2005. http://www.vldb.org/pvldb/vol14/p1992-sun.pdf
- Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explor. 14, 2 (2012), 20–28. https://doi.org/10.1145/2481244.2481248
- LightRW: FPGA Accelerated Graph Dynamic Random Walks. Proc. ACM Manag. Data 1, 1, Article 90 (may 2023), 27 pages. https://doi.org/10.1145/3588944
- Distributed Matrix-Based Sampling for Graph Neural Network Training. arXiv preprint arXiv:2311.02909 (2023).
- Jeffrey S Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37–57.
- Alastair J. Walker. 1977. An Efficient Method for Generating Discrete Random Variables with General Distributions. ACM Trans. Math. Softw. 3, 3 (1977), 253–256. https://doi.org/10.1145/355744.355749
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018).
- Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315 (2019).
- Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs. In 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021, Atlanta, GA, USA, September 26-29, 2021. IEEE, 304–317. https://doi.org/10.1109/PACT52795.2021.00029
- Optimizing GPU-based Graph Sampling and Random Walk for Efficiency and Scalability. IEEE Trans. Comput. (2023).
- {{\{{GraphWalker}}\}}: An {{\{{I/O-Efficient}}\}} and {{\{{Resource-Friendly}}\}} Graph Analytic System for Fast and Scalable Random Walks. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 559–571.
- NosWalker: A Decoupled Architecture for Out-of-Core Random Walk Processing. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 466–482.
- Remember where you came from: on the second-order random walk based proximity measures. Proceedings of the VLDB Endowment 10, 1 (2016), 13–24.
- Random walks on huge graphs at cache efficiency. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 311–326.
- KnightKing: a fast distributed graph random walk engine. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 524–537. https://doi.org/10.1145/3341301.3359634
- Scalable Graph Sampling on GPUs with Compressed Graph. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM ’22). Association for Computing Machinery, New York, NY, USA, 2383–2392. https://doi.org/10.1145/3511808.3557443
- AGL: A Scalable System for Industrial-Purpose Graph Machine Learning. Proc. VLDB Endow. 13, 12 (aug 2020), 3125–3137. https://doi.org/10.14778/3415478.3415539
- AliGraph: a comprehensive graph neural network platform. Proceedings of the VLDB Endowment 12, 12 (2019), 2094–2105.