Fair Column Subset Selection (2306.04489v4)
Abstract: The problem of column subset selection asks for a subset of columns from an input matrix such that the matrix can be reconstructed as accurately as possible within the span of the selected columns. A natural extension is to consider a setting where the matrix rows are partitioned into two groups, and the goal is to choose a subset of columns that minimizes the maximum reconstruction error of both groups, relative to their respective best rank-k approximation. Extending the known results of column subset selection to this fair setting is not straightforward: in certain scenarios it is unavoidable to choose columns separately for each group, resulting in double the expected column count. We propose a deterministic leverage-score sampling strategy for the fair setting and show that sampling a column subset of minimum size becomes NP-hard in the presence of two groups. Despite these negative results, we give an approximation algorithm that guarantees a solution within 1.5 times the optimal solution size. We also present practical heuristic algorithms based on rank-revealing QR factorization. Finally, we validate our methods through an extensive set of experiments using real-world data.
- Greedy column subset selection: New bounds and distributed algorithms. In ICML. PMLR, 2539–2548.
- Spectral relaxations and fair densest subgraphs. In CIKM. ACM, 35–44.
- Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed: 02-05-2023.
- AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias.
- Near-optimal column-based matrix reconstruction. SIAM J. Comput. 43, 2 (2014), 687–717.
- Unsupervised feature selection for principal components analysis. In KDD. 61–69.
- An Improved Approximation Algorithm for the Column Subset Selection Problem. In SODA (SODA ’09). SIAM, 968–977.
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In FAccT. PMLR, 77–91.
- Tony F. Chan and Per Christian Hansen. 1994. Low-rank revealing QR factorizations. Numerical Linear Algebra with Applic. 1, 1 (1994), 33–44.
- Tony F. Chan. 1987. Rank revealing QR factorizations. Linear Algebra and its Applic. 88-89 (1987), 67–82.
- Fair Clustering Through Fairlets. In NeuRIPS. Curran Associates, Inc., 5029–5037.
- The medical expenditure panel survey: a national information resource to support healthcare cost research and inform policy and practice. Medical care (2009), S44–S50.
- Amit Deshpande and Luis Rademacher. 2010. Efficient volume sampling for row/column subset selection. In FCS. IEEE, 329–338.
- Matrix approximation and projective clustering via volume sampling. Theory of Computing 2, 1 (2006), 225–247.
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
- Fairness through awareness. In Innovations in TCS. ACM, 214–226.
- Modification-fair cluster editing. In AAAI, Vol. 36. 6631–6638.
- M. R. Garey and D. S. Johnson. 1979. Computers and Intractability. W. H. Freeman.
- Socially fair k-means clustering. In FAccT. ACM, 438–448.
- G. Golub. 1965. Numerical Methods for Solving Linear Least Squares Problems. Numer. Math. 7, 3 (jun 1965), 206–216. https://doi.org/10.1007/BF01436075
- Y Hong and C. T. Pan. 1992. Rank-Revealing QR Factorizations and the Singular Value Decomposition. Math. Comp. 58 (1992), 213–232.
- Faisal Kamiran and Toon Calders. 2010. Classification with no discrimination by preferential sampling. In Machine Learning Conf., Vol. 1. Citeseer.
- Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. KIS 33, 1 (2012), 1–33.
- Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 3819–3828. https://doi.org/10.1145/2702123.2702520
- Algorithmic fairness. In AEA papers and proceedings, Vol. 108. 22–27.
- Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by nonnegative matrix factorization. Nature 401 (1999), 788–791.
- Does mitigating ML's impact disparity require treatment disparity?. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2018/file/8e0384779e58ce2af40eb365b318cc32-Paper.pdf
- Algorithmic fairness: Choices, assumptions, and definitions. Annual Review of Statistics and Its Application 8 (2021), 141–163.
- Matt Olfat and Anil Aswani. 2019. Convex Formulations for Fair Principal Component Analysis. In AAAI. AAAI Press, 663–670.
- Provable deterministic leverage score sampling. In KDD. ACM, 997–1006.
- Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science 2, 11 (1901), 559–572.
- Discrimination-aware data mining. In KDD. ACM, 560–568.
- Dana Pessach and Erez Shmueli. 2022. A review on fairness in machine learning. Comput. Surveys 55, 3 (2022), 1–44.
- Inioluwa Deborah Raji and Joy Buolamwini. 2022. Actionable Auditing Revisited: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products. Commun. ACM 66, 1 (2022), 101–108.
- An economic perspective on algorithmic fairness. In AEA Papers and Proceedings, Vol. 110. 91–95.
- The AI Ethicist’s Dirty Hands Problem. Commun. ACM 66, 1 (2022), 39–41.
- The Price of Fair PCA: One Extra Dimension. In NeuRIPS (NIPS’18). Curran Associates Inc., 10999–11010.
- Yaroslav Shitov. 2021. Column subset selection is NP-complete. Linear Algebra Appl. 610 (2021), 52–58.
- Multi-Criteria Dimensionality Reduction with Applications to Fairness. In NIPS. Curran Associates Inc., Red Hook, NY, USA, Article 1358, 11 pages.
- Why machine learning may lead to unfairness: Evidence from risk assessment for juvenile justice in catalonia. In International Conference on Artificial Intelligence and Law. 83–92.
- Fairness constraints: Mechanisms for fair classification. In Artificial intelligence and statistics. PMLR, 962–970.
- Learning fair representations. In ICML. PMLR, 325–333.