Matched Pair Calibration for Ranking Fairness (2306.03775v3)
Abstract: We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.
- Cluster–Robust Variance Estimation for Dyadic Data. Political Analysis 23, 4 (2017), 564–577. https://doi.org/10.1093/pan/mpv018
- Gary Becker. 1957. The Economics of Discrimination. University of Chicago Press, Chicago.
- Fairness in Recommendation Ranking through Pairwise Comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 2212–2220. https://doi.org/10.1145/3292500.3330745
- Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Econometrica 82, 6 (2014), 2295–2326. https://doi.org/10.3982/ECTA11757 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.3982/ECTA11757
- Matias D. Cattaneo and Rocío Titiunik. 2022. Regression Discontinuity Designs. Annual Review of Economics 14, 1 (2022), 821–851. https://doi.org/10.1146/annurev-economics-051520-021409 arXiv:https://doi.org/10.1146/annurev-economics-051520-021409
- Sam Corbett-Davies and Sharad Goel. 2018. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. https://doi.org/10.48550/ARXIV.1808.00023
- Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 2221–2231. https://doi.org/10.1145/3292500.3330691
- Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc., Barcelona, Spain. https://proceedings.neurips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf
- F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
- Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE international conference on data mining. IEEE, IEEE, Pisa, Italy, 263–272.
- Peter Hull. 2021. What Marginal Outcome Tests Can Tell Us About Racially-Biased Decision-Making. https://t.co/aBBCXwhY4a
- Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781139025751
- On Fairness and Calibration. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., Long Beach, California. https://proceedings.neurips.cc/paper/2017/file/b8b9c74ac526fffbeb2d39ab038d1cd7-Paper.pdf
- Simpson’s Paradox in Recommender Fairness: Reconciling differences between per-user and aggregated evaluations. CoRR abs/2210.07755 (2022), 1–11. https://doi.org/10.48550/arXiv.2210.07755 arXiv:2210.07755
- An Outcome Test of Discrimination for Ranked Lists. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 350–356. https://doi.org/10.1145/3531146.3533102
- Ashudeep Singh and Thorsten Joachims. 2018. Fairness of Exposure in Rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 2219–2228. https://doi.org/10.1145/3219819.3220088
- Fairness in Ranking, Part I: Score-Based Ranking. ACM Comput. Surv. 55, 6, Article 118 (dec 2022), 36 pages. https://doi.org/10.1145/3533379