Task-Oriented GNNs Training on Large Knowledge Graphs for Accurate and Efficient Modeling (2403.05752v2)
Abstract: A Knowledge Graph (KG) is a heterogeneous graph encompassing a diverse range of node and edge types. Heterogeneous Graph Neural Networks (HGNNs) are popular for training machine learning tasks like node classification and link prediction on KGs. However, HGNN methods exhibit excessive complexity influenced by the KG's size, density, and the number of node and edge types. AI practitioners handcraft a subgraph of a KG G relevant to a specific task. We refer to this subgraph as a task-oriented subgraph (TOSG), which contains a subset of task-related node and edge types in G. Training the task using TOSG instead of G alleviates the excessive computation required for a large KG. Crafting the TOSG demands a deep understanding of the KG's structure and the task's objectives. Hence, it is challenging and time-consuming. This paper proposes KG-TOSA, an approach to automate the TOSG extraction for task-oriented HGNN training on a large KG. In KG-TOSA, we define a generic graph pattern that captures the KG's local and global structure relevant to a specific task. We explore different techniques to extract subgraphs matching our graph pattern: namely (i) two techniques sampling around targeted nodes using biased random walk or influence scores, and (ii) a SPARQL-based extraction method leveraging RDF engines' built-in indices. Hence, it achieves negligible preprocessing overhead compared to the sampling techniques. We develop a benchmark of real KGs of large sizes and various tasks for node classification and link prediction. Our experiments show that KG-TOSA helps state-of-the-art HGNN methods reduce training time and memory usage by up to 70% while improving the model performance, e.g., accuracy and inference time.
- T. P. Tanon, G. Weikum, and F. M. Suchanek, “YAGO 4: A reason-able knowledge base,” in The Semantic Web - 17th International Conference, ESWC, ser. Lecture Notes in Computer Science, vol. 12123. Springer, 2020, pp. 583–596. [Online]. Available: https://doi.org/10.1007/978-3-030-49461-2_34
- D. Vrandecic and M. Krötzsch, “Wikidata: a free collaborative knowledge base,” Commun. ACM, vol. 57, no. 10, pp. 78–85, 2014. [Online]. Available: https://doi.org/10.1145/2629489
- S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph neural networks in recommender systems: A survey,” ACM Comput. Surv., vol. 55, no. 5, pp. 97:1–97:37, 2023. [Online]. Available: https://doi.org/10.1145/3535101
- Z. Wang, Q. Lv, X. Lan, and Y. Zhang, “Cross-lingual knowledge graph alignment via graph convolutional networks,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018, pp. 349–357. [Online]. Available: https://aclanthology.org/D18-1032
- X. Lin, Z. Quan, Z. Wang, T. Ma, and X. Zeng, “Kgnn: Knowledge graph neural network for drug-drug interaction prediction,” in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2020, pp. 2739–2745. [Online]. Available: https://doi.org/10.24963/ijcai.2020/380
- Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, “Enhancing graph neural network-based fraud detectors against camouflaged fraudsters,” in The ACM International Conference on Information and Knowledge Management (CIKM), 2020, pp. 315–324. [Online]. Available: https://doi.org/10.1145/3340531.3411903
- H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. K. Prasanna, “Graphsaint: Graph sampling based inductive learning method,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020, , GitHub Code: https://github.com/snap-stanford/ogb/blob/master/examples/nodeproppred/mag/graph_saint.py. [Online]. Available: https://openreview.net/forum?id=BJe8pkHFwS
- H. Zeng, M. Zhang, Y. Xia, and et.al., “Decoupling the depth and scope of graph neural networks,” CoRR, vol. abs/2201.07858, 2022, , GitHub Code: https://github.com/facebookresearch/shaDow_GNN. [Online]. Available: https://arxiv.org/abs/2201.07858
- J. Chen, T. Ma, and C. Xiao, “Fastgcn: Fast learning with graph convolutional networks via importance sampling,” in 6th International Conference on Learning Representations, ICLR. OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=rytstxWAW
- P. Yu, C. Fu, Y. Yu, C. Huang, Z. Zhao, and J. Dong, “Multiplex heterogeneous graph convolutional network,” in KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. ACM, 2022, pp. 2377–2387. [Online]. Available: https://doi.org/10.1145/3534678.3539482
- M. Chen, W. Zhang, Y. Zhu, H. Zhou, Z. Yuan, C. Xu, and H. Chen, “Meta-knowledge transfer for inductive knowledge graph embedding,” in Proceedings of the 45th International ACM SIGIR Conference, ser. SIGIR ’22. Association for Computing Machinery, 2022, p. 927–937, , GitHub Code: https://github.com/zjukg/MorsE. [Online]. Available: https://doi.org/10.1145/3477495.3531757
- T. Nguyen, Z. Liu, and Y. Fang, “Link prediction on latent heterogeneous graphs,” in Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023. ACM, 2023, pp. 263–273. [Online]. Available: https://doi.org/10.1145/3543507.3583284
- X. Yang, M. Yan, S. Pan, X. Ye, and D. Fan, “Simple and efficient heterogeneous graph neural network,” AAAI, vol. abs/2207.02547, 2023, gitHub Code: https://github.com/ICT-GIMLab/SeHGNN. [Online]. Available: https://doi.org/10.48550/arXiv.2207.02547
- W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” in Advances in Neural Information Processing Systems 33:NeurIPS, 2020. [Online]. Available: https://arxiv.org/abs/2005.00687
- Q. Lv, M. Ding, and et. al., “Are we really making much progress?: Revisiting, benchmarking and refining heterogeneous graph neural networks,” in KDD 21. ACM, 2021, pp. 1150–1160. [Online]. Available: https://doi.org/10.1145/3447548.3467350
- W. Hu, M. Fey, H. Ren, and et.al., “OGB-LSC: A large-scale challenge for machine learning on graphs,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS, 2021. [Online]. Available: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/db8e1af0cb3aca1ae2d0018624204529-Abstract-round2.html
- M. Färber, “The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data,” in The Semantic Web - ISWC, Proceedings, Part II, ser. Lecture Notes in Computer Science, vol. 11779, 2019, pp. 113–129. [Online]. Available: https://doi.org/10.1007/978-3-030-30796-7_8
- X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and P. S. Yu, “A survey on heterogeneous graph embedding: Methods, techniques, applications and sources,” IEEE Trans. Big Data, vol. 9, no. 2, pp. 415–436, 2023. [Online]. Available: https://doi.org/10.1109/TBDATA.2022.3177455
- M. R. Ackermann. (2022) dblp in rdf. [Online]. Available: https://blog.dblp.org/2022/03/02/dblp-in-rdf/
- T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, “Convolutional 2d knowledge graph embeddings,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence,(AAAI-18), 2018, pp. 1811–1818. [Online]. Available: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17366
- M. S. Schlichtkrull, T. N. Kipf, and e. a. Peter Bloem, “Modeling relational data with graph convolutional networks,” in The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings, vol. 10843. Springer, 2018, pp. 593–607, , GitHub Code: https://github.com/thiviyanT/torch-rgcn. [Online]. Available: https://doi.org/10.1007/978-3-319-93417-4_38
- S. Yun, M. Jeong, S. Yoo, S. Lee, S. S. Yi, R. Kim, J. Kang, and H. J. Kim, “Graph transformer networks: Learning meta-path graphs to improve gnns,” Neural Networks, vol. 153, pp. 104–119, 2022. [Online]. Available: https://doi.org/10.1016/j.neunet.2022.05.026
- Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path-based top-k similarity search in heterogeneous information networks,” Proc. VLDB Endow., vol. 4, no. 11, p. 992–1003, jun 2020. [Online]. Available: https://doi.org/10.14778/3402707.3402736
- X. Liu, M. Yan, L. Deng, G. Li, X. Ye, and D. Fan, “Sampling methods for efficient training of graph convolutional networks: A survey,” IEEE CAA J. Autom. Sinica, vol. 9, no. 2, pp. 205–234, 2022. [Online]. Available: https://doi.org/10.1109/JAS.2021.1004311
- J. Gasteiger, C. Qian, and S. Günnemann, “Influence-based mini-batching for graph neural networks,” in Learning on Graphs Conference, LoG 2022, 9-12 December 2022, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 198, 2022, p. 9. [Online]. Available: https://proceedings.mlr.press/v198/gasteiger22a.html
- Z. Hu, Y. Dong, K. Wang, and Y. Sun, “Heterogeneous graph transformer,” in Proceedings of The Web Conference 2020. New York, NY, USA: Association for Computing Machinery, 2020, p. 2704–2710. [Online]. Available: https://doi.org/10.1145/3366423.3380027
- L. Yu, J. Shen, J. Li, and A. Lerer, “Scalable graph neural networks for heterogeneous graphs,” 2020. [Online]. Available: https://arxiv.org/abs/2011.09679
- L. Waikhom and R. Patgiri, “Graph neural networks: Methods, applications, and opportunities,” CoRR, vol. abs/2108.10733, 2021. [Online]. Available: https://arxiv.org/abs/2108.10733
- S. Xiao, S. Wang, Y. Dai, and W. Guo, “Graph neural networks in node classification: survey and evaluation,” Machine Vision and Applications, vol. 33, pp. 1–19, 2022. [Online]. Available: https://link.springer.com/article/10.1007/s00138-021-01251-0
- M. Wang, L. Qiu, and X. Wang, “A survey on knowledge graph embeddings for link prediction,” Symmetry, vol. 13, no. 3, 2021. [Online]. Available: https://www.mdpi.com/2073-8994/13/3/485
- H. Han, T. Zhao, and et.al., “Openhgnn: An open source toolkit for heterogeneous graph neural network,” in CIKM, 2022. [Online]. Available: https://dl.acm.org/doi/pdf/10.1145/3511808.3557664
- J. Lin, A. Zhang, M. Lécuyer, J. Li, A. Panda, and S. Sen, “Measuring the effect of training data on deep learning predictions via randomized experiments,” in Proceedings of the 39th International Conference on Machine Learning, 2022, pp. 13 468–13 504. [Online]. Available: https://proceedings.mlr.press/v162/lin22h/lin22h.pdf
- C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Heterogeneous graph neural network,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. ACM, 2019, pp. 793–803. [Online]. Available: https://doi.org/10.1145/3292500.3330961
- Y. Rong, W. Huang, T. Xu, and J. Huang, “Dropedge: Towards deep graph convolutional networks on node classification,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. [Online]. Available: https://openreview.net/forum?id=Hkx1qkrKPr
- D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun, “Measuring and relieving the over-smoothing problem for graph neural networks from the topological view,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence IAAI 2020. AAAI Press, 2020, pp. 3438–3445. [Online]. Available: https://arxiv.org/abs/1909.03211
- X. Bingbing, H. Shen, Q. Cao, Y. Liu, K. Cen, and X. Cheng, “Towards powerful graph neural networks: Diversity matters,” 2021. [Online]. Available: https://openreview.net/references/pdf?id=5SP0-4MqUr
- Y. M. Omar and P. Plapper, “A survey of information entropy metrics for complex networks,” Entropy, vol. 22, no. 12, 2020. [Online]. Available: https://www.mdpi.com/1099-4300/22/12/1417
- R. Andersen, F. R. K. Chung, and K. J. Lang, “Local graph partitioning using pagerank vectors,” in 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), 21-24 October 2006, Berkeley, California, USA, Proceedings. IEEE Computer Society, 2006, pp. 475–486. [Online]. Available: https://doi.org/10.1109/FOCS.2006.44
- H. Abdallah and E. Mansour, “Towards a gml-enabled knowledge graph platform,” in Proceedings of the International Conference on Data Engineering (ICDE), 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10184515/
- H. Abdallah, W. Afandi, and E. Mansour, “Demonstration of sparqlml: An interfacing language for supporting graph machine learning for rdf graphs,” Proc. VLDB Endow., vol. 16, no. 12, p. 3974–3977, aug 2023. [Online]. Available: https://doi.org/10.14778/3611540.3611599
- C. Weiss, P. Karras, and A. Bernstein, “Hexastore: sextuple indexing for semantic web data management,” Proceedings of VLDB Endowment, (PVLDB), vol. 1, no. 1, pp. 1008–1019, 2008. [Online]. Available: http://www.vldb.org/pvldb/vol1/1453965.pdf
- L. Ma, Z. Yang, Y. Miao, J. Xue, M. Wu, L. Zhou, and Y. Dai, “Neugraph: Parallel deep neural network computation on large graphs,” in 2019 USENIX Annual Technical Conference, USENIX ATC 2019, Renton, WA, USA, July 10-12, 2019. USENIX Association, 2019, pp. 443–458. [Online]. Available: https://www.usenix.org/conference/atc19/presentation/ma
- H. Liu, S. Lu, X. Chen, and B. He, “G3: When graph neural networks meet parallel graph processing systems on gpus,” PVLDB (Proceedings of the VLDB Endowment), demo paper, 2020. [Online]. Available: https://dl.acm.org/doi/10.14778/3415478.3415482
- D. Zheng, C. Ma, M. Wang, and et.al., “Distdgl: Distributed graph neural network training for billion-scale graphs,” in 10th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, IA3. IEEE, 2020, pp. 36–44. [Online]. Available: https://doi.org/10.1109/IA351965.2020.00011
- Z. Cai, Yan, Xiao, and et.al., “Dgcl: An efficient communication library for distributed gnn training,” in Proceedings of the Sixteenth European Conference on Computer Systems. New York, NY, USA: Association for Computing Machinery, 2021, p. 130–144. [Online]. Available: https://doi.org/10.1145/3447786.3456233
- J. Peng, Z. Chen, and et.al., “Sancus: Staleness-aware communication-avoiding full-graph decentralized training in large-scale graph neural networks,” Proc. VLDB Endow., vol. 15, no. 9, p. 1937–1950, 2022. [Online]. Available: https://doi.org/10.14778/3538598.3538614
- R. Zhu, K. Zhao, H. Yang, W. Lin, C. Zhou, B. Ai, Y. Li, and J. Zhou, “Aligraph: a comprehensive graph neural network platform,” Proceedings of the VLDB Endowment, vol. 12, no. 12, pp. 2094–2105, 2019. [Online]. Available: https://arxiv.org/abs/1902.08730
- O. Ferludin, A. Eigenwillig, M. Blais, and et.al., “TF-GNN: graph neural networks in tensorflow,” CoRR, vol. abs/2207.03522, 2022. [Online]. Available: http://arxiv.org/abs/2207.03522
- P. Team. (2022) Torch geometric documentation. [Online]. Available: https://pytorch-geometric.readthedocs.io/en/latest/index.html
- J. Mohoney, R. Waleffe, H. Xu, T. Rekatsinas, and S. Venkataraman, “Marius: Learning massive graph embeddings on a single machine,” in 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), 2021, pp. 533–549. [Online]. Available: https://www.usenix.org/conference/osdi21/presentation/mohoney
- Hussein Abdallah (2 papers)
- Waleed Afandi (1 paper)
- Panos Kalnis (13 papers)
- Essam Mansour (10 papers)