Heterophily-informed Message Passing (2504.19785v1)

Published 28 Apr 2025 in cs.LG

Abstract: Graph neural networks (GNNs) are known to be vulnerable to oversmoothing due to their implicit homophily assumption. We mitigate this problem with a novel scheme that regulates the aggregation of messages, modulating the type and extent of message passing locally thereby preserving both the low and high-frequency components of information. Our approach relies solely on learnt embeddings, obviating the need for auxiliary labels, thus extending the benefits of heterophily-aware embeddings to broader applications, e.g., generative modelling. Our experiments, conducted across various data sets and GNN architectures, demonstrate performance enhancements and reveal heterophily patterns across standard classification benchmarks. Furthermore, application to molecular generation showcases notable performance improvements on chemoinformatics benchmarks.

Summary

Heterophily-Informed Message Passing in Graph Neural Networks

The paper proposes a novel heterophily-informed message passing scheme for Graph Neural Networks (GNNs) to address the prevalent oversmoothing problem attributed to the implicit homophily assumption. Traditional GNNs typically assume that neighboring nodes tend to have similar labels or features—an assumption that holds true in homophilous settings such as citation networks and social graphs. However, in heterophilous graphs, where nodes connect based on differing labels, this can lead to performance degradation.

Main Contributions

Architecture-Independent Approach: The authors introduce a flexible modification for GNNs that allows encoding both homophily and heterophily, thereby improving model effectiveness in diverse graph types.
HetFlows for Molecular Generation: A flow-based model that employs a multi-channel message passing mechanism to better model the generation process and achieve notable improvements in molecule generation tasks.
Experimental Validation: Extensive experiments on node classification and molecular generation benchmarks demonstrate the effectiveness of the proposed heterophily-informed MP scheme across different domains.

Theoretical and Practical Implications

Node Classification: By employing heterophily-aware routes for message passing, the modified GNN structures can adaptively capture and utilize high-frequency information specific to nodes of differing labels. This results in notable improvements especially in heterophilous data settings, as demonstrated across 10 out of 15 tested benchmarks. Moreover, the MixMP variant, which combines the original, homophily, and heterophily-informed message passing pathways, consistently improves classification performance, suggesting enhanced generalization.

Molecular Generation: By modifying the underlying GNN architecture of MoFlow to account for heterophily, the authors present HetFlows which show improved fidelity and diversity metrics in generated molecules. Through benchmarks like FCD, SNN, and others, HetFlows produces molecules that are closer in feature space to reference datasets while maintaining high validity and novelty.

Numerical Results and Observations

Numerous datasets were evaluated with MixMP yielding up to 3.84\% improvement on node classification tasks compared to traditional GNNs. Such numbers indicate the potential benefit of integrating heterophily-awareness into graph processing pipelines. Similarly, HetFlows achieves competitive performance in molecule generation metrics, especially when relational structures (adjacency matrices) are derived directly rather than sampled.

Future Directions

The heterophily-informed MP scheme shows promise for improving the expressiveness of GNNs in various application domains. Future work could focus on refining homophily and heterophily estimates within message passing processes, exploring deeper integration into other GNN architectures, and expanding heterophily utilization into more complex graph tasks beyond node classification and molecular generation.

Given the flexibility of this approach, there is potential to expand its application into recommendation systems, anomaly detection, and network embedding tasks where graph heterogeneity plays a critical role. Additionally, understanding its limitations in low-significance, small-scale datasets remains crucial—work could be done on adaptive message modulation recognizing dataset size or underlying sparsity dynamics.

In summary, this research advances our understanding of GNN limitations in non-homophilous graphs and presents a pathway forward that maintains practical adaptiveness between homophily assumptions and real-world graph heterogeneity.