A Survey on Learning from Graphs with Heterophily: Recent Advances and Future Directions

Published 18 Jan 2024 in cs.SI, cs.AI, and cs.LG | (2401.09769v4)

Abstract: Graphs are structured data that models complex relations between real-world entities. Heterophilic graphs, where linked nodes are prone to be with different labels or dissimilar features, have recently attracted significant attention and found many real-world applications. Meanwhile, increasing efforts have been made to advance learning from graphs with heterophily. Various graph heterophily measures, benchmark datasets, and learning paradigms are emerging rapidly. In this survey, we comprehensively review existing works on learning from graphs with heterophily. First, we overview over 500 publications, of which more than 340 are directly related to heterophilic graphs. After that, we survey existing metrics of graph heterophily and list recent benchmark datasets. Further, we systematically categorize existing methods based on a hierarchical taxonomy including GNN models, learning paradigms and practical applications. In addition, broader topics related to graph heterophily are also included. Finally, we discuss the primary challenges of existing studies and highlight promising avenues for future research.

Abstract PDF HTML Upgrade to Chat

References (66)

Citations (1)

View on Semantic Scholar

Summary

The paper offers a comprehensive taxonomy categorizing over 180 studies, emphasizing both (semi-)supervised and self-supervised learning approaches.
It details diverse model architectures, including decoupled message-passing networks and global attention graph transformers, to tackle the challenges of heterophily.
The study highlights practical applications in social and biochemical fields while suggesting future benchmarks, robustness improvements, and innovative performance metrics.

Overview of "Towards Learning from Graphs with Heterophily: Progress and Future"

The paper "Towards Learning from Graphs with Heterophily: Progress and Future" offers a thorough investigation into the burgeoning field of graph learning with heterophily, addressing the unique challenges presented when nodes in a graph tend to connect with dissimilar nodes. Unlike the common assumption of homophily, where similar nodes are typically connected, heterophilous graphs are more representative of many real-world scenarios such as social networks with bots, urban computational networks, and gene regulation networks.

Key Contributions

The authors Chenghua Gong et al. systematically analyze over 180 publications relating to heterophilous graph learning and present a hierarchical taxonomy to categorize this research landscape. The major components of this taxonomy include:

Learning Strategies: This component is further divided into several methods, with significant focus on both (semi-)supervised and self-supervised learning paradigms:
- (Semi-)Supervised Learning: Several advancements have been made, such as leveraging adaptive filters to capture high-frequency signals beyond traditional GNNs, exploring non-local graph homophily to enhance expressive power, and innovative models like GPR-GNN and MixHop.
- Self-Supervised Learning: This includes leveraging contrastive and generative learning approaches to enhance representation learning without labeled data. Methods like MUSE and DGCN are examples of how contrastive and generative frameworks adapt to the presence of heterophily.
Model Architectures: The paper delineates between message-passing neural networks (MPNNs) and graph transformers:
- MPNNs: Researchers developed decoupled message-passing architectures, e.g., GCNII, to combat over-smoothing and better capture the richness of graph heterophily.
- Graph Transformers: These models, like NodeFormer and SignGT, are effective due to their global attention mechanisms, albeit with high computational overheads.
Practical Applications: The study highlights how heterophilous graph learning is applied in diverse fields:
- Social Networks: Applications include fraud and automated account detection, where models like DRAG and BotSCL have been proposed to handle the graph heterophily inherent in these networks.
- Biochemical Networks: In drug discovery and gene regulation, methods capitalize on the heterophily within networks to identify effective drug combinations and reconstruct regulatory networks.

Implications and Future Work

The nuanced analysis provided in the paper directly supports several research themes:

Benchmarking and Datasets: The authors suggest the need for improved, large-scale, and diverse benchmark datasets that better capture the complexity of heterophilous graphs.
Robustness and Explainability: Future research should prioritize developing models that are robust to attacks and noise and transparent in operations, catering to both domain experts and end-users.
Innovative Metrics: As current metrics for measuring graph heterophily have limitations, the development of advanced metrics is essential for accurately assessing model performance and guiding network designs.
Extended Applications: Further exploration into unmapped domains like weakly-supervised and few-shot learning could uncover new opportunities and leverage the potential of heterophilous graph frameworks.

Overall, this comprehensive review highlights the rich and rapidly expanding domain of learning from graphs with heterophily, underscoring its theoretical and practical significance. The paper calls for continued innovation in the methodologies, infrastructure, and application realms to fully exploit the potential heterophilous graphs hold for complex data representation.

Markdown Report Issue