Say No to the Discrimination: Learning Fair Graph Neural Networks with Limited Sensitive Attribute Information (2009.01454v5)

Published 3 Sep 2020 in cs.LG

Abstract: Graph neural networks (GNNs) have shown great power in modeling graph structured data. However, similar to other machine learning models, GNNs may make predictions biased on protected sensitive attributes, e.g., skin color and gender. Because machine learning algorithms including GNNs are trained to reflect the distribution of the training data which often contains historical bias towards sensitive attributes. In addition, the discrimination in GNNs can be magnified by graph structures and the message-passing mechanism. As a result, the applications of GNNs in sensitive domains such as crime rate prediction would be largely limited. Though extensive studies of fair classification have been conducted on i.i.d data, methods to address the problem of discrimination on non-i.i.d data are rather limited. Furthermore, the practical scenario of sparse annotations in sensitive attributes is rarely considered in existing works. Therefore, we study the novel and important problem of learning fair GNNs with limited sensitive attribute information. FairGNN is proposed to eliminate the bias of GNNs whilst maintaining high node classification accuracy by leveraging graph structures and limited sensitive information. Our theoretical analysis shows that FairGNN can ensure the fairness of GNNs under mild conditions given limited nodes with known sensitive attributes. Extensive experiments on real-world datasets also demonstrate the effectiveness of FairGNN in debiasing and keeping high accuracy.

Authors (2)

Enyan Dai (32 papers)
Suhang Wang (118 papers)

Citations (224)

View on Semantic Scholar

Summary

Learning Fair Graph Neural Networks with Limited Sensitive Attribute Information

Graph Neural Networks (GNNs) have emerged as a powerful tool for modeling data with intrinsic graph structures, excelling in various domains, from knowledge graph construction to recommendation systems. However, like other machine learning models, GNNs can propagate and even amplify societal biases present in the training data, due to their reliance on historical data patterns that may include biases such as those based on gender, age, or race. This bias propagation becomes especially problematic in domains involving sensitive applications, such as crime prediction and recruitment decisions. The existing literature mainly addresses fairness in independently and identically distributed (i.i.d) data, leaving the challenge of crafting fair machine learning models for non-i.i.d data largely unexplored. Moreover, these problems are compounded when sensitive attribute annotations are sparse.

This research paper tackles the challenge of constructing fair GNNs under conditions where limited sensitive attribute information is available. The authors propose a novel framework, FairGNN, which integrates biased data and graph structures to debias GNNs while retaining high classification accuracy.

Summary of the Approach

FairGNN addresses the need for fair node classification with two main components: a GNN sensitive attribute estimator and an adversarial debiasing mechanism. The estimator predicts sensitive attributes for nodes with unknown attributes, thereby allowing the adversarial network to work with a more complete set of data to remove biases from the node representations learned by the GNN classifier. The primary goal is to ensure that the predictions of the GNN classifier remain independent of sensitive attributes while maintaining high classification accuracy.

Key Components

Sensitive Attribute Estimator ( $f_E$ ): Utilizes a Graph Convolution Network (GCN) to estimate the sensitive attributes of nodes. This component addresses the scarcity of sensitive attribute annotations and enhances the effectiveness of adversarial debiasing.
Adversarial Network ( $f_A$ ): Involves an adversary trained to predict sensitive attributes from node representations. Meanwhile, the GNN classifier ( $f_\mathcal{G}$ ) is trained to learn representations that impair the adversary's predictions, promoting fairness in representations.
Covariance Constraint: Complements adversarial learning by ensuring the predictions are invariant with estimated sensitive attributes to stabilize training and enhance fairness.

Theoretical Insights

The authors provide a theoretical foundation ensuring that FairGNN achieves statistical parity under specific conditions. They demonstrate that even with estimated sensitive attributes containing some noise, the adversarial debiasing effectively diminishes biases under mild assumptions. These assumptions include the independence of noisy sensitive attributes from the node representations, given the true sensitive attributes. Moreover, the use of covariance constraints further reinforces fairness by directly regularizing the predictions.

Experimental Evaluation

FairGNN is evaluated against several baseline models including GCN, GAT, and other fair classification models that incorporate graph data. FairGNN consistently outperforms these baselines, significantly reducing bias while maintaining competitive accuracy on real-world datasets from social networks and other domains. The framework demonstrates robustness in scenarios with varying amounts of sensitive attribute data and node labels, thus highlighting its practicality in real-world applications.

Implications and Future Directions

This research opens up new avenues for embedding fairness constraints directly into graph-based learning frameworks, emphasizing the importance of fairness in sensitive domains where GNNs could be effectively deployed. Future research directions include extending FairGNN to multi-class and multi-sensitive attribute scenarios and addressing potential inaccuracies in sensitive attribute data through enhanced estimators. Another promising direction could involve exploring methods for dynamic graph updates to preemptively adjust for bias-inducing network modifications.

In summary, this paper presents a compelling approach to learning fair GNNs, pointing toward practical strategies for mitigating bias in graph-structured data, thereby broadening the scope of fair machine learning across varied knowledge domains.

PDF Markdown

Related Papers

Find Related Papers