Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Matched Pair Calibration for Ranking Fairness (2306.03775v3)

Published 6 Jun 2023 in cs.LG

Abstract: We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Cluster–Robust Variance Estimation for Dyadic Data. Political Analysis 23, 4 (2017), 564–577. https://doi.org/10.1093/pan/mpv018
  2. Gary Becker. 1957. The Economics of Discrimination. University of Chicago Press, Chicago.
  3. Fairness in Recommendation Ranking through Pairwise Comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 2212–2220. https://doi.org/10.1145/3292500.3330745
  4. Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Econometrica 82, 6 (2014), 2295–2326. https://doi.org/10.3982/ECTA11757 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.3982/ECTA11757
  5. Matias D. Cattaneo and Rocío Titiunik. 2022. Regression Discontinuity Designs. Annual Review of Economics 14, 1 (2022), 821–851. https://doi.org/10.1146/annurev-economics-051520-021409 arXiv:https://doi.org/10.1146/annurev-economics-051520-021409
  6. Sam Corbett-Davies and Sharad Goel. 2018. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. https://doi.org/10.48550/ARXIV.1808.00023
  7. Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 2221–2231. https://doi.org/10.1145/3292500.3330691
  8. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc., Barcelona, Spain. https://proceedings.neurips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf
  9. F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
  10. Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE international conference on data mining. IEEE, IEEE, Pisa, Italy, 263–272.
  11. Peter Hull. 2021. What Marginal Outcome Tests Can Tell Us About Racially-Biased Decision-Making. https://t.co/aBBCXwhY4a
  12. Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781139025751
  13. On Fairness and Calibration. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., Long Beach, California. https://proceedings.neurips.cc/paper/2017/file/b8b9c74ac526fffbeb2d39ab038d1cd7-Paper.pdf
  14. Simpson’s Paradox in Recommender Fairness: Reconciling differences between per-user and aggregated evaluations. CoRR abs/2210.07755 (2022), 1–11. https://doi.org/10.48550/arXiv.2210.07755 arXiv:2210.07755
  15. An Outcome Test of Discrimination for Ranked Lists. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 350–356. https://doi.org/10.1145/3531146.3533102
  16. Ashudeep Singh and Thorsten Joachims. 2018. Fairness of Exposure in Rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 2219–2228. https://doi.org/10.1145/3219819.3220088
  17. Fairness in Ranking, Part I: Score-Based Ranking. ACM Comput. Surv. 55, 6, Article 118 (dec 2022), 36 pages. https://doi.org/10.1145/3533379
Citations (1)

Summary

We haven't generated a summary for this paper yet.