Graph Neural Networks with Heterophily (2009.13566v3)

Published 28 Sep 2020 in cs.LG, cs.SI, and stat.ML

Abstract: Graph Neural Networks (GNNs) have proven to be useful for many different practical applications. However, many existing GNN models have implicitly assumed homophily among the nodes connected in the graph, and therefore have largely overlooked the important setting of heterophily, where most connected nodes are from different classes. In this work, we propose a novel framework called CPGNN that generalizes GNNs for graphs with either homophily or heterophily. The proposed framework incorporates an interpretable compatibility matrix for modeling the heterophily or homophily level in the graph, which can be learned in an end-to-end fashion, enabling it to go beyond the assumption of strong homophily. Theoretically, we show that replacing the compatibility matrix in our framework with the identity (which represents pure homophily) reduces to GCN. Our extensive experiments demonstrate the effectiveness of our approach in more realistic and challenging experimental settings with significantly less training data compared to previous works: CPGNN variants achieve state-of-the-art results in heterophily settings with or without contextual node features, while maintaining comparable performance in homophily settings.

Authors (7)

Jiong Zhu (9 papers)
Ryan A. Rossi (124 papers)
Anup Rao (47 papers)
Tung Mai (32 papers)
Nedim Lipka (49 papers)
Nesreen K. Ahmed (76 papers)
Danai Koutra (70 papers)

Citations (274)

View on Semantic Scholar

Summary

The paper presents the CPGNN framework that incorporates a compatibility matrix to learn inter-class connection dynamics.
It details a two-stage process where prior node beliefs are computed and then propagated across the graph using the learned matrix.
Experimental results demonstrate up to 30% improvement over traditional GNNs under strong heterophily conditions.

An Overview of Graph Neural Networks with Heterophily

Graph Neural Networks (GNNs) have emerged as a powerful tool for leveraging the relational structure inherent in many datasets. Common applications include recommendation systems, bioinformatics, and fraud detection. Traditional GNN architectures often make a homophily assumption, which means they are designed to capture patterns where connected nodes tend to belong to the same class or share similar features. However, this assumption limits their applicability, particularly in graphs where heterophily, i.e., nodes connecting with other nodes from different classes, is prevalent. The paper "Graph Neural Networks with Heterophily" introduces a novel framework, CPGNN, to address this limitation by explicitly accounting for heterophily.

Core Contribution

CPGNN Framework:

The core innovation of CPGNN lies in its integration of a compatibility matrix that learns the likelihood of connections between different classes. This compatibility matrix elevates the model's flexibility, allowing it to adapt to both homophilous and heterophilous scenarios. The framework consists of two main stages:

Prior Belief Estimation: Prior beliefs about the node classes are computed using a model such as a multi-layer perceptron (MLP) or a GCN variant without homophily constraints. This setup permits the use of any neural network model that can be adjusted to the graph's structure to infer preliminary node classification.
Compatibility-Guided Propagation: The computed prior beliefs are propagated across the graph, guided by the compatibility matrix that captures the network's homophily and heterophily properties. This matrix is learned end-to-end, enriching its interpretability by revealing the likelihood of inter-class connections.

Theoretical Insights and Performance

The paper provides theoretical validation that under pure homophily (where the compatibility matrix is an identity matrix), the CPGNN reduces to a traditional GCN. This connection highlights why standard GCN models perform suboptimally in heterogeneous conditions and sets the foundation for the CPGNN's adaptable nature.

Experiments corroborate the effectiveness of CPGNN across graphs with varying degrees of heterophily. Notably, CPGNN variants outperform GCNs and other baselines on synthetic benchmarks illustrating both homophily and heterophily, reflecting up to a 30% improvement under strong heterophily conditions. Not only does CPGNN exhibit superior performance with node features, but it also shows robustness in scenarios where these features are absent.

Practical and Theoretical Implications

The introduction of the compatibility matrix in GNNs represents a significant advancement for applications involving networks that do not conform to the homophily assumption. By allowing the model to learn relationship dynamics across classes, CPGNN enhances our ability to analyze and infer from complex networks prevalent in domains such as social networks and biological systems.

Looking forward, this framework provides an adaptable foundation for the development of more sophisticated graph learning models. Future research could explore the integration of non-linear transformations and deeper architectures within the CPGNN framework, possibly advancing further into unsupervised or self-supervised learning paradigms. Additionally, investigating the compatibility matrix in broader contexts, such as multilayer and temporal graphs, could yield substantial theoretical developments and practical applications.

In conclusion, "Graph Neural Networks with Heterophily" presents a versatile and effective framework overcoming the intrinsic limitations of conventional GNN models under heterophilous conditions. This contribution not only extends the applicability of GNNs across a wider range of datasets but also enriches our understanding of graph structures through interpretable and learned compatibility matrices.

PDF Markdown