Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 35 tok/s Pro
GPT-4o 94 tok/s
GPT OSS 120B 476 tok/s Pro
Kimi K2 190 tok/s Pro
2000 character limit reached

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation (2401.04133v2)

Published 7 Jan 2024 in cs.LG, cs.AI, and cs.SI

Abstract: Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation aimed at enhancing graph learning and explanation. SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules. This process, supplemented by post-pruning techniques, ensures the synthetic HIN closely mirrors the original graph's structural and statistical properties. Crucially, SynHING provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation and contributing to the advancement of interpretable machine learning in complex networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Abbe, E. 2017. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1): 6446–6531.
  2. Statistical mechanics of complex networks. Reviews of modern physics, 74(1): 47.
  3. Machine learning on graphs: A model and comprehensive taxonomy. Journal of Machine Learning Research, 23(89): 1–64.
  4. Personalized recommendation system based on collaborative filtering for IoT scenarios. IEEE Transactions on Services Computing, 13(4): 685–695.
  5. Towards self-explainable graph neural network. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 302–311.
  6. Benchmarking graph neural networks.
  7. TreeXGNN: can gradient-boosted decision trees help boost heterogeneous graph neural networks? In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE.
  8. Heterogeneous graph transformer. In Proceedings of the web conference 2020, 2704–2710.
  9. A survey of explainable graph neural networks: Taxonomy and evaluation metrics. arXiv preprint arXiv:2207.12599.
  10. Orphicx: A causality-inspired latent variable model for interpreting graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13729–13738.
  11. Deep generation of heterogeneous networks. In 2021 IEEE International Conference on Data Mining (ICDM), 379–388. IEEE.
  12. Parameterized explainer for graph neural network. Advances in neural information processing systems, 33: 19620–19631.
  13. Are we really making much progress? revisiting, benchmarking and refining heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, 1150–1160.
  14. Network motifs: simple building blocks of complex networks. Science, 298(5594): 824–827.
  15. Graphworld: Fake graphs bring real insights for gnns. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3691–3701.
  16. Pathfinder discovery networks for neural message passing. In Proceedings of the Web Conference 2021, 2547–2558.
  17. Overlapping community detection with graph neural networks. arXiv preprint arXiv:1909.12201.
  18. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of classification, 14(1): 75–100.
  19. A deep learning approach to antibiotic discovery. Cell, 180(4): 688–702.
  20. Graph clustering with graph neural networks. arXiv preprint arXiv:2006.16904.
  21. Synthetic Graph Generation to Benchmark Graph Learning. arXiv preprint arXiv:2204.01376.
  22. Heterogeneous Graph Attention Network. In The World Wide Web Conference, WWW ’19, 2022–2032. New York, NY, USA: Association for Computing Machinery. ISBN 9781450366748.
  23. Gnnexplainer: Generating explanations for graph neural networks. Advances in neural information processing systems, 32.
  24. On explainability of graph neural networks via subgraph explorations. In International Conference on Machine Learning, 12241–12252. PMLR.
  25. mg2vec: Learning relationship-preserving heterogeneous graph representations via metagraph embedding. IEEE Transactions on Knowledge and Data Engineering, 34(3): 1317–1329.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents an automated method that synthesizes heterogeneous networks using motif-based merging to replicate real-world graph structures.
  • It demonstrates how configurable merge thresholds and motif counts enable tailored, analytics-friendly datasets for robust GNN explanation studies.
  • The work provides benchmark datasets with built-in explanation ground truths, paving the way for fair evaluations of graph neural network interpretability.

Introduction to SynHIN

Graph Neural Networks (GNNs) are powerful tools for machine learning on graph data, which has critical applications ranging from social network analysis to e-commerce fraud detection. One significant challenge in this field is the shortage of public heterogeneous information network (HIN) datasets for testing and improving GNNs, particularly where explainability is vital. Addressing this need, this paper introduces SynHIN, a new method designed to create synthetic HINs that closely mirror the statistical properties of real-world networks.

The Need for Synthetic Data

Real-world HIN datasets are rare and often non-representative, leading to overfitting and bias in GNN models. Research efforts aimed at advancing the interpretability of GNNs are hindered by these limitations and the absence of ground truths for explaining model decisions. Synthetic datasets with inherent explanation capabilities offer a promising solution to this predicament. SynHIN not only generates analytics-friendly synthetic data but also embeds explanation ground truths within the graph, enhancing the interpretability studies for GNNs.

SynHIN's Approach and Contributions

SynHIN identifies frequent motifs — recurring, significant subgraph patterns — in real datasets and employs a novel merge strategy to build clusters with these explanatory motifs, subsequently assembling them into a full synthetic HIN. This technique incorporates In-Cluster Merge and Out-Cluster Merge processes, ensuring the synthetic network's structure and features closely match those of a real-world equivalent.

The paper's main contributions include:

  • A new automated methodology to create realistic synthetic HIN datasets, enabling more robust research and testing for explainable AI in the domain of HINs.
  • Constructing benchmark datasets with built-in explanation ground truths, propelling the field forward by providing a common platform for the development and assessment of new GNN explanation methods.
  • Introducing a modular framework for HIN synthesis, involving steps like motif extraction, subgraph building, merging, pruning, and node feature generation, which can be adapted for diverse dataset requirements.

Experimental Insights and Applications

Experiments conducted by the authors showed how adjusting SynHIN's parameters, such as the merge thresholds and motif counts, could tune the synthetic dataset for different research purposes. The findings endorse SynHIN's utility in generating datasets that enable fair and insightful comparisons of GNN interpretation models. Moreover, the research demonstrates SynHIN's flexibility to create multi-label datasets with rich ground truths that pave the way for more transparent and explainable AI systems.

In conclusion, SynHIN stands as a significant breakthrough for researchers in the field of graph machine learning, particularly in light of its modularity, which allows for customization to suit the specificities of various HIN structures. It enhances the interpretability and generalization capability of AI models, equipping researchers with a reliable tool for benchmarking and overcoming one of GNN's major research barriers— the lack of comprehensive and varied datasets.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.