Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Task-Oriented GNNs Training on Large Knowledge Graphs for Accurate and Efficient Modeling (2403.05752v2)

Published 9 Mar 2024 in cs.LG and cs.AI

Abstract: A Knowledge Graph (KG) is a heterogeneous graph encompassing a diverse range of node and edge types. Heterogeneous Graph Neural Networks (HGNNs) are popular for training machine learning tasks like node classification and link prediction on KGs. However, HGNN methods exhibit excessive complexity influenced by the KG's size, density, and the number of node and edge types. AI practitioners handcraft a subgraph of a KG G relevant to a specific task. We refer to this subgraph as a task-oriented subgraph (TOSG), which contains a subset of task-related node and edge types in G. Training the task using TOSG instead of G alleviates the excessive computation required for a large KG. Crafting the TOSG demands a deep understanding of the KG's structure and the task's objectives. Hence, it is challenging and time-consuming. This paper proposes KG-TOSA, an approach to automate the TOSG extraction for task-oriented HGNN training on a large KG. In KG-TOSA, we define a generic graph pattern that captures the KG's local and global structure relevant to a specific task. We explore different techniques to extract subgraphs matching our graph pattern: namely (i) two techniques sampling around targeted nodes using biased random walk or influence scores, and (ii) a SPARQL-based extraction method leveraging RDF engines' built-in indices. Hence, it achieves negligible preprocessing overhead compared to the sampling techniques. We develop a benchmark of real KGs of large sizes and various tasks for node classification and link prediction. Our experiments show that KG-TOSA helps state-of-the-art HGNN methods reduce training time and memory usage by up to 70% while improving the model performance, e.g., accuracy and inference time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. T. P. Tanon, G. Weikum, and F. M. Suchanek, “YAGO 4: A reason-able knowledge base,” in The Semantic Web - 17th International Conference, ESWC, ser. Lecture Notes in Computer Science, vol. 12123.   Springer, 2020, pp. 583–596. [Online]. Available: https://doi.org/10.1007/978-3-030-49461-2_34
  2. D. Vrandecic and M. Krötzsch, “Wikidata: a free collaborative knowledge base,” Commun. ACM, vol. 57, no. 10, pp. 78–85, 2014. [Online]. Available: https://doi.org/10.1145/2629489
  3. S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph neural networks in recommender systems: A survey,” ACM Comput. Surv., vol. 55, no. 5, pp. 97:1–97:37, 2023. [Online]. Available: https://doi.org/10.1145/3535101
  4. Z. Wang, Q. Lv, X. Lan, and Y. Zhang, “Cross-lingual knowledge graph alignment via graph convolutional networks,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018, pp. 349–357. [Online]. Available: https://aclanthology.org/D18-1032
  5. X. Lin, Z. Quan, Z. Wang, T. Ma, and X. Zeng, “Kgnn: Knowledge graph neural network for drug-drug interaction prediction,” in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2020, pp. 2739–2745. [Online]. Available: https://doi.org/10.24963/ijcai.2020/380
  6. Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, “Enhancing graph neural network-based fraud detectors against camouflaged fraudsters,” in The ACM International Conference on Information and Knowledge Management (CIKM), 2020, pp. 315–324. [Online]. Available: https://doi.org/10.1145/3340531.3411903
  7. H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. K. Prasanna, “Graphsaint: Graph sampling based inductive learning method,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020, , GitHub Code: https://github.com/snap-stanford/ogb/blob/master/examples/nodeproppred/mag/graph_saint.py. [Online]. Available: https://openreview.net/forum?id=BJe8pkHFwS
  8. H. Zeng, M. Zhang, Y. Xia, and et.al., “Decoupling the depth and scope of graph neural networks,” CoRR, vol. abs/2201.07858, 2022, , GitHub Code: https://github.com/facebookresearch/shaDow_GNN. [Online]. Available: https://arxiv.org/abs/2201.07858
  9. J. Chen, T. Ma, and C. Xiao, “Fastgcn: Fast learning with graph convolutional networks via importance sampling,” in 6th International Conference on Learning Representations, ICLR.   OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=rytstxWAW
  10. P. Yu, C. Fu, Y. Yu, C. Huang, Z. Zhao, and J. Dong, “Multiplex heterogeneous graph convolutional network,” in KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022.   ACM, 2022, pp. 2377–2387. [Online]. Available: https://doi.org/10.1145/3534678.3539482
  11. M. Chen, W. Zhang, Y. Zhu, H. Zhou, Z. Yuan, C. Xu, and H. Chen, “Meta-knowledge transfer for inductive knowledge graph embedding,” in Proceedings of the 45th International ACM SIGIR Conference, ser. SIGIR ’22.   Association for Computing Machinery, 2022, p. 927–937, , GitHub Code: https://github.com/zjukg/MorsE. [Online]. Available: https://doi.org/10.1145/3477495.3531757
  12. T. Nguyen, Z. Liu, and Y. Fang, “Link prediction on latent heterogeneous graphs,” in Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023.   ACM, 2023, pp. 263–273. [Online]. Available: https://doi.org/10.1145/3543507.3583284
  13. X. Yang, M. Yan, S. Pan, X. Ye, and D. Fan, “Simple and efficient heterogeneous graph neural network,” AAAI, vol. abs/2207.02547, 2023, gitHub Code: https://github.com/ICT-GIMLab/SeHGNN. [Online]. Available: https://doi.org/10.48550/arXiv.2207.02547
  14. W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” in Advances in Neural Information Processing Systems 33:NeurIPS, 2020. [Online]. Available: https://arxiv.org/abs/2005.00687
  15. Q. Lv, M. Ding, and et. al., “Are we really making much progress?: Revisiting, benchmarking and refining heterogeneous graph neural networks,” in KDD 21.   ACM, 2021, pp. 1150–1160. [Online]. Available: https://doi.org/10.1145/3447548.3467350
  16. W. Hu, M. Fey, H. Ren, and et.al., “OGB-LSC: A large-scale challenge for machine learning on graphs,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS, 2021. [Online]. Available: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/db8e1af0cb3aca1ae2d0018624204529-Abstract-round2.html
  17. M. Färber, “The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data,” in The Semantic Web - ISWC, Proceedings, Part II, ser. Lecture Notes in Computer Science, vol. 11779, 2019, pp. 113–129. [Online]. Available: https://doi.org/10.1007/978-3-030-30796-7_8
  18. X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and P. S. Yu, “A survey on heterogeneous graph embedding: Methods, techniques, applications and sources,” IEEE Trans. Big Data, vol. 9, no. 2, pp. 415–436, 2023. [Online]. Available: https://doi.org/10.1109/TBDATA.2022.3177455
  19. M. R. Ackermann. (2022) dblp in rdf. [Online]. Available: https://blog.dblp.org/2022/03/02/dblp-in-rdf/
  20. T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, “Convolutional 2d knowledge graph embeddings,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence,(AAAI-18), 2018, pp. 1811–1818. [Online]. Available: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17366
  21. M. S. Schlichtkrull, T. N. Kipf, and e. a. Peter Bloem, “Modeling relational data with graph convolutional networks,” in The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings, vol. 10843.   Springer, 2018, pp. 593–607, , GitHub Code: https://github.com/thiviyanT/torch-rgcn. [Online]. Available: https://doi.org/10.1007/978-3-319-93417-4_38
  22. S. Yun, M. Jeong, S. Yoo, S. Lee, S. S. Yi, R. Kim, J. Kang, and H. J. Kim, “Graph transformer networks: Learning meta-path graphs to improve gnns,” Neural Networks, vol. 153, pp. 104–119, 2022. [Online]. Available: https://doi.org/10.1016/j.neunet.2022.05.026
  23. Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path-based top-k similarity search in heterogeneous information networks,” Proc. VLDB Endow., vol. 4, no. 11, p. 992–1003, jun 2020. [Online]. Available: https://doi.org/10.14778/3402707.3402736
  24. X. Liu, M. Yan, L. Deng, G. Li, X. Ye, and D. Fan, “Sampling methods for efficient training of graph convolutional networks: A survey,” IEEE CAA J. Autom. Sinica, vol. 9, no. 2, pp. 205–234, 2022. [Online]. Available: https://doi.org/10.1109/JAS.2021.1004311
  25. J. Gasteiger, C. Qian, and S. Günnemann, “Influence-based mini-batching for graph neural networks,” in Learning on Graphs Conference, LoG 2022, 9-12 December 2022, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 198, 2022, p. 9. [Online]. Available: https://proceedings.mlr.press/v198/gasteiger22a.html
  26. Z. Hu, Y. Dong, K. Wang, and Y. Sun, “Heterogeneous graph transformer,” in Proceedings of The Web Conference 2020.   New York, NY, USA: Association for Computing Machinery, 2020, p. 2704–2710. [Online]. Available: https://doi.org/10.1145/3366423.3380027
  27. L. Yu, J. Shen, J. Li, and A. Lerer, “Scalable graph neural networks for heterogeneous graphs,” 2020. [Online]. Available: https://arxiv.org/abs/2011.09679
  28. L. Waikhom and R. Patgiri, “Graph neural networks: Methods, applications, and opportunities,” CoRR, vol. abs/2108.10733, 2021. [Online]. Available: https://arxiv.org/abs/2108.10733
  29. S. Xiao, S. Wang, Y. Dai, and W. Guo, “Graph neural networks in node classification: survey and evaluation,” Machine Vision and Applications, vol. 33, pp. 1–19, 2022. [Online]. Available: https://link.springer.com/article/10.1007/s00138-021-01251-0
  30. M. Wang, L. Qiu, and X. Wang, “A survey on knowledge graph embeddings for link prediction,” Symmetry, vol. 13, no. 3, 2021. [Online]. Available: https://www.mdpi.com/2073-8994/13/3/485
  31. H. Han, T. Zhao, and et.al., “Openhgnn: An open source toolkit for heterogeneous graph neural network,” in CIKM, 2022. [Online]. Available: https://dl.acm.org/doi/pdf/10.1145/3511808.3557664
  32. J. Lin, A. Zhang, M. Lécuyer, J. Li, A. Panda, and S. Sen, “Measuring the effect of training data on deep learning predictions via randomized experiments,” in Proceedings of the 39th International Conference on Machine Learning, 2022, pp. 13 468–13 504. [Online]. Available: https://proceedings.mlr.press/v162/lin22h/lin22h.pdf
  33. C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Heterogeneous graph neural network,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019.   ACM, 2019, pp. 793–803. [Online]. Available: https://doi.org/10.1145/3292500.3330961
  34. Y. Rong, W. Huang, T. Xu, and J. Huang, “Dropedge: Towards deep graph convolutional networks on node classification,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.   OpenReview.net, 2020. [Online]. Available: https://openreview.net/forum?id=Hkx1qkrKPr
  35. D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun, “Measuring and relieving the over-smoothing problem for graph neural networks from the topological view,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence IAAI 2020.   AAAI Press, 2020, pp. 3438–3445. [Online]. Available: https://arxiv.org/abs/1909.03211
  36. X. Bingbing, H. Shen, Q. Cao, Y. Liu, K. Cen, and X. Cheng, “Towards powerful graph neural networks: Diversity matters,” 2021. [Online]. Available: https://openreview.net/references/pdf?id=5SP0-4MqUr
  37. Y. M. Omar and P. Plapper, “A survey of information entropy metrics for complex networks,” Entropy, vol. 22, no. 12, 2020. [Online]. Available: https://www.mdpi.com/1099-4300/22/12/1417
  38. R. Andersen, F. R. K. Chung, and K. J. Lang, “Local graph partitioning using pagerank vectors,” in 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), 21-24 October 2006, Berkeley, California, USA, Proceedings.   IEEE Computer Society, 2006, pp. 475–486. [Online]. Available: https://doi.org/10.1109/FOCS.2006.44
  39. H. Abdallah and E. Mansour, “Towards a gml-enabled knowledge graph platform,” in Proceedings of the International Conference on Data Engineering (ICDE), 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10184515/
  40. H. Abdallah, W. Afandi, and E. Mansour, “Demonstration of sparqlml: An interfacing language for supporting graph machine learning for rdf graphs,” Proc. VLDB Endow., vol. 16, no. 12, p. 3974–3977, aug 2023. [Online]. Available: https://doi.org/10.14778/3611540.3611599
  41. C. Weiss, P. Karras, and A. Bernstein, “Hexastore: sextuple indexing for semantic web data management,” Proceedings of VLDB Endowment, (PVLDB), vol. 1, no. 1, pp. 1008–1019, 2008. [Online]. Available: http://www.vldb.org/pvldb/vol1/1453965.pdf
  42. L. Ma, Z. Yang, Y. Miao, J. Xue, M. Wu, L. Zhou, and Y. Dai, “Neugraph: Parallel deep neural network computation on large graphs,” in 2019 USENIX Annual Technical Conference, USENIX ATC 2019, Renton, WA, USA, July 10-12, 2019.   USENIX Association, 2019, pp. 443–458. [Online]. Available: https://www.usenix.org/conference/atc19/presentation/ma
  43. H. Liu, S. Lu, X. Chen, and B. He, “G3: When graph neural networks meet parallel graph processing systems on gpus,” PVLDB (Proceedings of the VLDB Endowment), demo paper, 2020. [Online]. Available: https://dl.acm.org/doi/10.14778/3415478.3415482
  44. D. Zheng, C. Ma, M. Wang, and et.al., “Distdgl: Distributed graph neural network training for billion-scale graphs,” in 10th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, IA3.   IEEE, 2020, pp. 36–44. [Online]. Available: https://doi.org/10.1109/IA351965.2020.00011
  45. Z. Cai, Yan, Xiao, and et.al., “Dgcl: An efficient communication library for distributed gnn training,” in Proceedings of the Sixteenth European Conference on Computer Systems.   New York, NY, USA: Association for Computing Machinery, 2021, p. 130–144. [Online]. Available: https://doi.org/10.1145/3447786.3456233
  46. J. Peng, Z. Chen, and et.al., “Sancus: Staleness-aware communication-avoiding full-graph decentralized training in large-scale graph neural networks,” Proc. VLDB Endow., vol. 15, no. 9, p. 1937–1950, 2022. [Online]. Available: https://doi.org/10.14778/3538598.3538614
  47. R. Zhu, K. Zhao, H. Yang, W. Lin, C. Zhou, B. Ai, Y. Li, and J. Zhou, “Aligraph: a comprehensive graph neural network platform,” Proceedings of the VLDB Endowment, vol. 12, no. 12, pp. 2094–2105, 2019. [Online]. Available: https://arxiv.org/abs/1902.08730
  48. O. Ferludin, A. Eigenwillig, M. Blais, and et.al., “TF-GNN: graph neural networks in tensorflow,” CoRR, vol. abs/2207.03522, 2022. [Online]. Available: http://arxiv.org/abs/2207.03522
  49. P. Team. (2022) Torch geometric documentation. [Online]. Available: https://pytorch-geometric.readthedocs.io/en/latest/index.html
  50. J. Mohoney, R. Waleffe, H. Xu, T. Rekatsinas, and S. Venkataraman, “Marius: Learning massive graph embeddings on a single machine,” in 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), 2021, pp. 533–549. [Online]. Available: https://www.usenix.org/conference/osdi21/presentation/mohoney
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hussein Abdallah (2 papers)
  2. Waleed Afandi (1 paper)
  3. Panos Kalnis (13 papers)
  4. Essam Mansour (10 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com