Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline (2402.15113v2)

Published 23 Feb 2024 in cs.LG and cs.DC

Abstract: Memory-based Temporal Graph Neural Networks (MTGNNs) are a class of temporal graph neural networks that utilize a node memory module to capture and retain long-term temporal dependencies, leading to superior performance compared to memory-less counterparts. However, the iterative reading and updating process of the memory module in MTGNNs to obtain up-to-date information needs to follow the temporal dependencies. This introduces significant overhead and limits training throughput. Existing optimizations for static GNNs are not directly applicable to MTGNNs due to differences in training paradigm, model architecture, and the absence of a memory module. Moreover, they do not effectively address the challenges posed by temporal dependencies, making them ineffective for MTGNN training. In this paper, we propose MSPipe, a general and efficient framework for MTGNNs that maximizes training throughput while maintaining model accuracy. Our design addresses the unique challenges associated with fetching and updating node memory states in MTGNNs by integrating staleness into the memory module. However, simply introducing a predefined staleness bound in the memory module to break temporal dependencies may lead to suboptimal performance and lack of generalizability across different models and datasets. To solve this, we introduce an online pipeline scheduling algorithm in MSPipe that strategically breaks temporal dependencies with minimal staleness and delays memory fetching to obtain fresher memory states. Moreover, we design a staleness mitigation mechanism to enhance training convergence and model accuracy. We provide convergence analysis and prove that MSPipe maintains the same convergence rate as vanilla sample-based GNN training. Experimental results show that MSPipe achieves up to 2.45x speed-up without sacrificing accuracy, making it a promising solution for efficient MTGNN training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247 (2018).
  2. Stochastic training of graph convolutional networks with variance reduction. arXiv preprint arXiv:1710.10568 (2017).
  3. SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training. In Advances in Neural Information Processing Systems.
  4. Jack Choquette and Wish Gandhi. 2020. Nvidia a100 gpu: Performance & innovation for gpu computing. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1–43.
  5. Minimal variance sampling with provable guarantees for fast training of graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1393–1403.
  6. On the importance of sampling in learning graph convolutional networks. arXiv preprint arXiv:2103.02696 (2021).
  7. Do We Really Need Complicated Model Architectures For Temporal Networks? arXiv preprint arXiv:2302.11636 (2023).
  8. Toward Understanding the Impact of Staleness in Distributed Machine Learning. In International Conference on Learning Representations.
  9. Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed Deep Graph Learning at Scale.. In OSDI. 551–568.
  10. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
  11. More effective distributed ml via a stale synchronous parallel parameter server. Advances in neural information processing systems 26 (2013).
  12. Accelerating training and inference of graph neural networks with fast sampling and pipelining. Proceedings of Machine Learning and Systems 4 (2022), 172–189.
  13. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
  14. Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 1269–1278.
  15. Mining of massive data sets. Cambridge university press.
  16. Pipe-SGD: A decentralized pipelined SGD framework for distributed deep net training. Advances in Neural Information Processing Systems 31 (2018).
  17. Temporal-contextual recommendation in real-time. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2291–2299.
  18. Implementation and experimentation of producer-consumer synchronization problem. International Journal of Computer Applications 975, 8887 (2011), 32–37.
  19. Continuous-time dynamic network embeddings. In Companion proceedings of the the web conference 2018. 969–976.
  20. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
  21. Sancus: sta le n ess-aware c omm u nication-avoiding full-graph decentralized training in large-scale graph neural networks. Proceedings of the VLDB Endowment 15, 9 (2022), 1937–1950.
  22. Towards Better Evaluation for Dynamic Link Prediction. In Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks.
  23. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Advances in neural information processing systems 24 (2011).
  24. Temporal Graph Networks for Deep Learning on Dynamic Graphs. In Proceedings of International Conference on Learning Representations.
  25. Polina Rozenshtein and Aristides Gionis. 2019. Mining temporal networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 3225–3226.
  26. Dysat: Deep neural representation learning on dynamic graphs via self-attention networks. In Proceedings of the 13th international conference on web search and data mining. 519–527.
  27. Foundations and modeling of dynamic networks using dynamic graph neural networks: A survey. IEEE Access 9 (2021), 79143–79168.
  28. Dyrep: Learning representations over dynamic graphs. In International conference on learning representations.
  29. MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks. In Eighteenth European Conference on Computer Systems (EuroSys’ 23).
  30. PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication. In The Tenth International Conference on Learning Representations (ICLR 2022).
  31. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019).
  32. Apan: Asynchronous propagation attention network for real-time temporal graph embedding. In Proceedings of the 2021 international conference on management of data. 2628–2638.
  33. Yufeng Wang and Charith Mendis. 2023. TGOpt: Redundancy-Aware Optimizations for Temporal Graph Attention Networks. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. 354–368.
  34. Wei Wei and Kathleen M Carley. 2015. Measuring temporal patterns in dynamic social networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 1 (2015), 1–27.
  35. Inductive representation learning on temporal graphs. arXiv preprint arXiv:2002.07962 (2020).
  36. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  37. GNNlab: a factored system for sample-based GNN training over GPUs. In Proceedings of the Seventeenth European Conference on Computer Systems. 417–434.
  38. Time matters: Sequential recommendation with complex temporal information. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 1459–1468.
  39. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 974–983.
  40. IGE+: A Framework for Learning Node Embeddings in Interaction Graphs. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2019), 1032–1044.
  41. TIGER: Temporal Interaction Graph Embedding with Restarts. arXiv preprint arXiv:2302.06057 (2023).
  42. Learning temporal interaction graph embedding via coupled memory networks. In Proceedings of the web conference 2020. 3049–3055.
  43. ByteGNN: efficient graph neural network training at large scale. Proceedings of the VLDB Endowment 15, 6 (2022), 1228–1242.
  44. Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA. 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2022), 1108–1117.
  45. Tgl: A general framework for temporal gnn training on billion-scale graphs. arXiv preprint arXiv:2203.14883 (2022).
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com