Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks (2404.03139v2)

Published 4 Apr 2024 in cs.LG and cs.SI

Abstract: Graph Neural Networks (GNNs) often perform better for high-degree nodes than low-degree nodes on node classification tasks. This degree bias can reinforce social marginalization by, e.g., privileging celebrities and other high-degree actors in social networks during social and content recommendation. While researchers have proposed numerous hypotheses for why GNN degree bias occurs, we find via a survey of 38 degree bias papers that these hypotheses are often not rigorously validated, and can even be contradictory. Thus, we provide an analysis of the origins of degree bias in message-passing GNNs with different graph filters. We prove that high-degree test nodes tend to have a lower probability of misclassification regardless of how GNNs are trained. Moreover, we show that degree bias arises from a variety of factors that are associated with a node's degree (e.g., homophily of neighbors, diversity of neighbors). Furthermore, we show that during training, some GNNs may adjust their loss on low-degree nodes more slowly than on high-degree nodes; however, with sufficiently many epochs of training, message-passing GNNs can achieve their maximum possible training accuracy, which is not significantly limited by their expressive power. Throughout our analysis, we connect our findings to previously-proposed hypotheses for the origins of degree bias, supporting and unifying some while drawing doubt to others. We validate our theoretical findings on 8 common real-world networks, and based on our theoretical and empirical insights, describe a roadmap to alleviate degree bias.

References (61)

Authors (3)

Arjun Subramonian (22 papers)
Jian Kang (142 papers)
Yizhou Sun (149 papers)

Citations (1)

View on Semantic Scholar

Summary

Unveiling the Roots of Degree Bias in Graph Neural Networks

Introduction

Graph Neural Networks (GNNs) have emerged as the de facto standard for learning representations on graphs, with applications spanning across node classification, link prediction, and graph classification tasks. A peculiar phenomenon observed in these networks, however, is their better performance on high-degree nodes as opposed to their low-degree counterparts in node classification tasks. This bias not only skews the performance metrics but also has broader implications for fairness, potentially marginalizing already under-represented communities in social or citation networks. In the paper by Arjun Subramonian, Jian Kang, and Yizhou Sun, the authors explore the underlying reasons for degree bias in GNNs via a methodical investigation, combining theoretical analysis with empirical validation on real-world datasets.

Theoretical Insights into Degree Bias

The authors begin their exploration by establishing a theoretical framework to analyze degree bias in message-passing GNNs, particularly focusing on Graph Convolutional Networks (GCNs) with different normalization filters such as RW (Random Walk-normalized filter), SYM (Symmetric-normalized filter), and ATT (Attention-based filter). They prove that high-degree test nodes inherently have a lower probability of misclassification across these diverse architectures. Furthermore, they identify that factors such as homophily of neighbors and the diversity of neighbors' degrees contribute to the emergence of degree bias. Crucially, during training, it is shown that GNNs may adjust their loss on low-degree nodes more slowly compared to high-degree nodes. This finding implies that degree bias is not merely a by-product of model expressiveness but also a consequence of the training dynamics.

Empirical Investigations

The theoretical claims are substantiated through extensive experiments on eight real-world networks, covering a spectrum from citation to social and product co-purchasing networks. The empirical results align with the theoretical insights, showing a consistent pattern of degree bias across different GNN architectures and datasets. Specifically, it is observed that high-degree nodes generally incur a lower test loss than low-degree nodes. Additionally, the authors demonstrate that message-passing GNNs, irrespective of their normalization filters, can achieve their maximum possible training accuracy, suggesting that the expressive power of GNNs is not significantly restricted by degree bias.

Implications and Future Directions

The investigation into the origins of degree bias in GNNs by Subramonian et al. holds significant implications for the design of graph neural networks. By identifying the factors contributing to degree bias, the paper paves the way for developing more equitable and effective graph representation learning models. Specifically, it highlights the need for mechanisms that can enhance the representational quality of low-degree nodes, such as through neighborhood augmentation strategies or normalizing graph filters to minimize distributional discrepancies between node degrees. Moreover, addressing training discrepancies for nodes with varying degrees is critical for ensuring that no subset of nodes is systematically disadvantaged.

Looking ahead, the findings encourage a broader consideration of fairness in graph neural networks, urging future research to devise strategies that mitigate degree bias. Such efforts could involve designing novel architectural innovations, regularization techniques, or even rethinking the training algorithms to promote equitable learning outcomes across all nodes. Furthermore, as the paper primarily focuses on transductive learning settings, extending the analysis to inductive settings presents an interesting avenue for future research, potentially uncovering new dimensions of degree bias and opportunities for intervention.

Conclusion

The exploration into degree bias in GNNs by Subramonian and colleagues offers valuable insights into the mechanisms that give rise to this phenomenon. By marrying theoretical analysis with empirical validation, the paper enhances our understanding of why GNNs perform better on high-degree nodes and lays a foundation for future work aimed at developing fairer and more robust graph representation learning models. As the field of graph neural networks continues to evolve, addressing the challenges posed by degree bias will be paramount for ensuring that these powerful models can serve diverse and equitable applications across all domains.

PDF Markdown

Tweets

https://twitter.com/arjunsubgraph/status/1776694893597069733