SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning (2303.03379v3)
Abstract: Subgraph-based graph representation learning (SGRL) has recently emerged as a powerful tool in many prediction tasks on graphs due to its advantages in model expressiveness and generalization ability. Most previous SGRL models face computational challenges associated with the high cost of subgraph extraction for each training or test query. Recently, SUREL was proposed to accelerate SGRL, which samples random walks offline and joins these walks online as a proxy of subgraph for representation learning. Thanks to the reusability of sampled walks across different queries, SUREL achieves state-of-the-art performance in terms of scalability and prediction accuracy. However, SUREL still suffers from high computational overhead caused by node duplication in sampled walks. In this work, we propose a novel framework SUREL+ that upgrades SUREL by using node sets instead of walks to represent subgraphs. This set-based representation eliminates repeated nodes by definition but can also be irregular in size. To address this issue, we design a customized sparse data structure to efficiently store and access node sets and provide a specialized operator to join them in parallel batches. SUREL+ is modularized to support multiple types of set samplers, structural features, and neural encoders to complement the structural information loss after the reduction from walks to sets. Extensive experiments have been performed to validate SUREL+ in the prediction tasks of links, relation types, and higher-order patterns. SUREL+ achieves 3-11$\times$ speedups of SUREL while maintaining comparable or even better prediction performance; compared to other SGRL baselines, SUREL+ achieves $\sim$20$\times$ speedups and significantly improves the prediction accuracy.
- Subgraph neural networks. Advances in Neural Information Processing Systems 33 (2020), 8017–8029.
- Local graph partitioning using pagerank vectors. In The 47th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 475–486.
- Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences 115, 48 (2018), E11221–E11230.
- Scaling graph neural networks with approximate pagerank. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2464–2473.
- Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
- Structural temporal graph neural networks for anomaly detection in dynamic graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3747–3756.
- Graph Neural Networks for Link Prediction with Subgraph Sketching. In International Conference on Learning Representations.
- Fastgcn: fast learning with graph convolutional networks via importance sampling. In International Conference on Learning Representations.
- Can graph neural networks count substructures? Advances in Neural Information Processing Systems 33 (2020), 10383–10395.
- Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 257–266.
- DGL. 2022. 6.7 Using GPU for Neighborhood Sampling — DGL 0.9.1post1 documentation. https://docs.dgl.ai/guide/minibatch-gpu-sampling.html
- Attribution modeling increases efficiency of bidding in display advertising. In Proceedings of the AdKDD and TargetAd Workshop. ACM, 1–6.
- Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries. Advances in Neural Information Processing Systems 35 (2022).
- Generalization and representational limits of graph neural networks. In International Conference on Machine Learning. PMLR, 3419–3430.
- Inductive representation learning on large graphs. Advances in Neural Information Processing Systems 30 (2017), 1025–1035.
- William L Hamilton. 2020. Graph representation learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 14, 3 (2020), 1–159.
- Representation Learning on Graphs: Methods and Applications. IEEE Data Eng. Bull. 40, 3 (2017), 52–74.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems 33 (2020), 22118–22133.
- Kexin Huang and Marinka Zitnik. 2020. Graph meta learning via local subgraphs. Advances in Neural Information Processing Systems 33 (2020), 5862–5874.
- Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th International Conference on World Wide Web. 271–279.
- Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.
- Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.
- Introduction to statistical relational learning. MIT press.
- Geodesic Graph Neural Network for Efficient Graph Representation Learning. Advances in Neural Information Processing Systems 35 (2022).
- What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web. 591–600.
- Optimizing generalized pagerank methods for seed-expansion community detection. Advances in Neural Information Processing Systems 32 (2019), 11710–11721.
- Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning. Advances in Neural Information Processing Systems 33 (2020), 4465–4478.
- Neural subgraph isomorphism counting. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1959–1969.
- Neural Predicting Higher-Order Patterns in Temporal Networks. In Proceedings of the Web Conference 2022. ACM, 1340–1351.
- Neural Subgraph Matching. arXiv preprint arXiv:2007.03092 (2020).
- Yuhong Luo and Pan Li. 2022. Neighborhood-aware Scalable Temporal Network Representation Learning. Learning on Graphs Conference (2022).
- Subgraph pattern neural networks for high-order graph evolution prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
- Whole Brain Vessel Graphs: A Dataset and Benchmark for Graph Learning and Neuroscience. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Sancus: staleness-aware communication-avoiding full-graph decentralized training in large-scale graph neural networks. Proceedings of the VLDB Endowment 15, 9 (2022), 1937–1950.
- PyG. 2022. Accelerating PyG on NVIDIA GPUs. https://www.pyg.org//ns-newsarticle-accelerating-pyg-on-nvidia-gpus
- Modeling relational data with graph convolutional networks. In European semantic web conference. Springer, 593–607.
- Balasubramaniam Srinivasan and Bruno Ribeiro. 2020. On the equivalence between positional node embeddings and structural graph representations. In International Conference on Learning Representations.
- Learning over Families of Sets-Hypergraph Representation Learning for Higher Order Tasks. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). SIAM, 756–764.
- Inductive relation prediction by subgraph reasoning. In International Conference on Machine Learning. PMLR, 9448–9457.
- Graph attention networks. In International Conference on Learning Representations.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261–272.
- BNS-GCN: Efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proceedings of Machine Learning and Systems 4, 673–693.
- Pipegcn: Efficient full-graph training of graph convolutional networks with pipelined feature communication. In International Conference on Learning Representations.
- Xiyuan Wang and Muhan Zhang. 2021. GLASS: GNN with Labeling Tricks for Subgraph Representation Learning. In International Conference on Learning Representations.
- Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks. In International Conference on Learning Representations.
- Graph neural networks: foundation, frontiers and applications. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4840–4841.
- How Powerful are Graph Neural Networks?. In International Conference on Learning Representations.
- Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning. Proceedings of the VLDB Endowment 15, 11 (2022), 2788–2796.
- Decoupling the depth and scope of graph neural networks. Advances in Neural Information Processing Systems 34 (2021), 19665–19679.
- Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations.
- Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural networks. Advances in Neural Information Processing Systems 31 (2018), 5165–5175.
- Muhan Zhang and Yixin Chen. 2020. Inductive Matrix Completion Based on Graph Neural Networks. In International Conference on Learning Representations.
- Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning. Advances in Neural Information Processing Systems 34 (2021), 9061–9073.
- TGL: A General Framework for Temporal GNN Training on Billion-Scale Graphs. Proceedings of the VLDB Endowment 15, 8 (2022), 1572–1580.