Better Fair than Sorry: Adversarial Missing Data Imputation for Fair GNNs (2311.01591v2)
Abstract: This paper addresses the problem of learning fair Graph Neural Networks (GNNs) under missing protected attributes. GNNs have achieved state-of-the-art results in many relevant tasks where decisions might disproportionately impact specific communities. However, existing work on fair GNNs assumes that either protected attributes are fully-observed or that the missing data imputation is fair. In practice, biases in the imputation will be propagated to the model outcomes, leading them to overestimate the fairness of their predictions. We address this challenge by proposing Better Fair than Sorry (BFtS), a fair missing data imputation model for protected attributes used by fair GNNs. The key design principle behind BFtS is that imputations should approximate the worst-case scenario for the fair GNN -- i.e. when optimizing fairness is the hardest. We implement this idea using a 3-player adversarial scheme where two adversaries collaborate against the fair GNN. Experiments using synthetic and real datasets show that BFtS often achieves a better fairness $\times$ accuracy trade-off than existing alternatives.
- Towards a unified framework for fair and stable graph representation learning. In UAI, 2021.
- Influence and correlation in social networks. In SIGKDD, 2008.
- Machine bias. Auerbach Publications, 2022.
- Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. PNAS, 106(51):21544–21549, 2009.
- Adversarial learning for debiasing knowledge graph embeddings. arXiv preprint arXiv:2006.16309, 2020.
- A.-L. Barabási. Network Science. Cambridge University Press, 2016.
- A. Bose and W. Hamilton. Compositional fairness constraints for graph embeddings. In ICML, 2019.
- S. F. Buck. A method of estimation of missing values in multivariate data suitable for use with an electronic computer. Journal of the Royal Statistical Society: Series B (Methodological), 22(2):302–306, 1960.
- M. Buyl and T. De Bie. Debayes: a bayesian method for debiasing network embeddings. In ICML, 2020.
- Learning imbalanced datasets with label-distribution-aware margin loss. In NeurIPS, 2019.
- J. Chai and X. Wang. Self-supervised fair representation learning without demographics. In NeurIPS, 2022.
- Learning on attribute-missing graphs. TPAMI, 44(2):740–757, 2020.
- Fairegm: Fair link prediction and recommendation via emulated graph modification. In EEAMO, 2022.
- E. Dai and S. Wang. Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. In WSDM, 2021.
- A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59:1087–1091, 2006.
- Fairness via representation neutralization. In NeurIPS, 2021.
- C. K. Enders. Applied missing data analysis. Guilford press, 2010.
- The effect of homophily on disparate visibility of minorities in people recommender systems. In ICWSM, 2020.
- Fair graph auto-encoder for unbiased graph representations with wasserstein distance. In ICDM, 2021.
- Adapting fairness interventions to missing values. arXiv preprint arXiv:2305.19429, 2023.
- M. Ghallab. Responsible ai: requirements and challenges. AI Perspectives, 1(1):1–7, 2019.
- Sampling biases and missing data in explorations of sexual partner networks for the spread of sexually transmitted diseases. Statistics in medicine, 17(18):2079–2097, 1998.
- K. Gile and M. S. Handcock. Model-based assessment of the impact of missing data on inference for networks. Unpublished manuscript, University of Washington, Seattle, 2006.
- Analysis of networks with missing data with application to the national longitudinal study of adolescent health. Journal of the Royal Statistical Society Series C: Applied Statistics, 66(3):501–519, 2017.
- Neural message passing for quantum chemistry. In ICML, 2017.
- Inductive representation learning on large graphs. In NeurIPS, 2017.
- Equality of opportunity in supervised learning. In NeurIPS, 2016.
- Fairness without demographics in repeated loss minimization. In ICML, 2018.
- H. Hofmann. Statlog (German Credit Data). UCI Machine Learning Repository, 1994.
- M. Huisman. Imputation of missing network data: Some simple procedures. Journal of Social Structure, 10(1):1–29, 2009.
- The effect of race/ethnicity on sentencing: Examining sentence type, jail length, and prison length. Journal of Ethnicity in Criminal Justice, 13(3):179–196, 2015.
- F. Kamiran and T. Calders. Classifying without discriminating. In ICCCC, 2009.
- Homophily influences ranking of minorities in social networks. Scientific reports, 8(1):11077, 2018.
- Crosswalk: Fairness-enhanced node representation learning. In AAAI, 2022.
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
- All of the fairness for edge prediction with optimal transport. In AISTATS, 2021.
- Fairness without demographics through adversarially reweighted learning. In NeurIPS, 2020.
- Statistical analysis with missing data, volume 793. John Wiley & Sons, 2019.
- Impact of missing data imputation on the fairness and accuracy of graph node classifiers. In BigData, 2022.
- Bursting the filter bubble: Fairness-aware network link prediction. In AAAI, 2020.
- A survey on bias and fairness in machine learning. CSUR, 54(6):1–35, 2021.
- G. Molenberghs and M. Kenward. Missing data in clinical studies. John Wiley & Sons, 2007.
- M. Newman. Networks. Oxford University Press, 2018.
- Dual discriminator generative adversarial nets. NeurIPS, 2017.
- Potential explanations for why people are missed in the us census. Differential Undercounts in the US Census: Who is Missed?, pages 123–138, 2019.
- J. Palowitch and B. Perozzi. Monet: Debiasing graph embeddings via the metadata-orthogonal training unit. arXiv preprint arXiv:1909.11793, 2019.
- Reviewing autoencoders for missing data imputation: Technical trends, applications and outcomes. JAIR, 69:1255–1285, 2020.
- D. Pessach and E. Shmueli. A review on fairness in machine learning. ACM Computing Surveys (CSUR), 55(3):1–44, 2022.
- Fairwalk: towards fair graph embedding. In AAAI, 2019.
- On the unreasonable effectiveness of feature propagation in learning on graphs with missing node features. arXiv preprint arXiv:2111.12128, 2021.
- Fairdrop: Biased edge dropout for enhancing fairness in graph representation learning. TAI, 3(3):344–354, 2021.
- On the discrimination risk of mean aggregation feature imputation in graphs. In NeurIPS, 2022.
- H. Thanh-Tung and T. Tran. Catastrophic forgetting and mode collapse in gans. In 2020 international joint conference on neural networks (ijcnn), pages 1–10. IEEE, 2020.
- A three-player gan: generating hard samples to improve classification networks. In 2019 16th International Conference on Machine Vision Applications (MVA), pages 1–6. IEEE, 2019.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- Graph attention networks. stat, 1050(20):10–48550, 2017.
- N. Vigdor. Apple card investigated after gender discrimination complaints. https://www.nytimes.com/2019/11/10/business/Apple-credit-card-investigation.html, 2019.
- Improving fairness in graph neural networks via mitigating sensitive attribute leakage. In SIGKDD, 2022.
- R. Warren and J. S. Passel. A count of the uncountable: estimates of undocumented aliens counted in the 1980 united states census. Demography, pages 375–393, 1987.
- Gender-related data missingness, imbalance and bias in global health surveys. BMJ Global Health, 6(11):e007405, 2021.
- L. Weng. From gan to wgan. arXiv preprint arXiv:1904.08994, 2019.
- The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert systems with applications, 36(2):2473–2480, 2009.
- Y. Zhang and Q. Long. Assessing fairness in the presence of missing data. In NeurIPS, 2021.
- Z. Zhang and M. Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels. In NeurIPS, 2018.
- Mitigating unwanted biases with adversarial learning. In AIES, 2018.