Fairness Rising from the Ranks: HITS and PageRank on Homophilic Networks (2402.13787v3)
Abstract: In this paper, we investigate the conditions under which link analysis algorithms prevent minority groups from reaching high ranking slots. We find that the most common link-based algorithms using centrality metrics, such as PageRank and HITS, can reproduce and even amplify bias against minority groups in networks. Yet, their behavior differs: one one hand, we empirically show that PageRank mirrors the degree distribution for most of the ranking positions and it can equalize representation of minorities among the top ranked nodes; on the other hand, we find that HITS amplifies pre-existing bias in homophilic networks through a novel theoretical analysis, supported by empirical results. We find the root cause of bias amplification in HITS to be the level of homophily present in the network, modeled through an evolving network model with two communities. We illustrate our theoretical analysis on both synthetic and real datasets and we present directions for future work.
- Learning to rank networked entities. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 14–23, 2006.
- Machine bias. ProPublica, May 2016.
- Attribute network models, stochastic approximation, and network sampling and ranking algorithms. arXiv preprint arXiv:2304.08565, 2023.
- Homophily and the glass ceiling effect in social networks. In Proceedings of the 6th Conference on Innovations in Theoretical Computer Science, pages 41–50, 2015.
- Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
- Network biology: understanding the cell’s functional organization. Nature reviews genetics, 5(2):101–113, 2004.
- Fairness in recommendation ranking through pairwise comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2212–2220, 2019.
- Equity of attention: Amortizing individual fairness in rankings. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 405–414, 2018.
- Link analysis ranking: algorithms, theory, and experiments. Transactions on Internet Technology (TOIT), 5(1), February 2005.
- Homophily and long-run integration in social networks. Journal of Economic Theory, 147(5):1754–1786, 2012.
- Ranking with fairness constraints. arXiv preprint arXiv:1704.06840, 2017.
- Localization on low-order eigenvectors of data matrices. arXiv preprint arXiv:1109.1355, 2011.
- Algorithmic bias amplification via temporal effects: The case of pagerank in evolving networks. Communications in Nonlinear Science and Numerical Simulation, 104:106029, 2022.
- Learning to rank for semantic search. SemSearch@ WWW2011, 2011.
- Fairness in graph mining: A survey. IEEE Transactions on Knowledge and Data Engineering, 2023.
- Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 214–226, 2012.
- Inequality and inequity in network-based ranking and recommendation algorithms. Scientific reports, 12(1):1–14, 2022.
- Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pages 259–268, 2015.
- Topical interests and the mitigation of search engine bias. Proceedings of the National Academy of Sciences, 103(34):12684–12689, 2006.
- Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 2016.
- Jon M Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5):604–632, 1999.
- Inherent Trade-Offs in the Fair Determination of Risk Scores. In Proceedings of the 8th Innovations in Theoretical Computer Science Conference, volume 67, page 43. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
- Homophily and minority-group size explain perception biases in social networks. Nature human behaviour, 3(10):1078–1087, 2019.
- The stochastic approach for link-structure analysis (salsa) and the tkc effect. Computer Networks, 33(1-6):387–401, 2000.
- Michael Ley. DBLP: some lessons learned. Proceedings of the VLDB Endowment, 2(2):1493–1500, 2009.
- Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009.
- Ranking nodes in growing networks: When pagerank fails. Scientific reports, 5(1):1–10, 2015.
- Birds of a Feather: Homophily in Social Networks. Annual review of sociology, 27:415–444, 2001.
- Hits on the web: How does it compare? In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 471–478, 2007.
- Mixing patterns and community structure in networks. In Statistical mechanics of complex networks, pages 66–87. Springer, 2003.
- Stable algorithms for link analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 258–266, 2001.
- The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
- The price of fair PCA: One extra dimension. Advances in Neural Information Processing Systems, 31, 2018.
- Policy learning for fairness in ranking. Advances in Neural Information Processing Systems, 32, 2019.
- An overview of Microsoft Academic Service (MAS) and applications. In Proceedings of the 24th International Conference on World Wide Web, pages 243–246. ACM, 2015.
- Algorithmic glass ceiling in social networks: The effects of social recommendations on network diversity. In Proceedings of The Web Conference, pages 923–932, 2018.
- Arnetminer: Extraction and mining of academic social networks. In proceedings of the 21th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pages 990–998, 2008.
- Learning to rank at query-time using association rules. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 267–274, 2008.
- Propagation of societal gender inequality by internet search algorithms. Proceedings of the National Academy of Sciences, 119(29):e2204529119, 2022.
- Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics, pages 962–970. PMLR, 2017.
- Reducing disparate exposure in ranking: A learning to rank approach. In Proceedings of The Web Conference, pages 2849–2855, 2020.
- Fa* ir: A fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1569–1578, 2017.
- Fairness in ranking, part i: Score-based ranking. ACM Computing Surveys, 55(6):1–36, 2022a.
- Fairness in ranking, part ii: Learning-to-rank and recommender systems. ACM Computing Surveys, 55(6):1–41, 2022b.
- Chasm in hegemony: Explaining and reproducing disparities in homophilous networks. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 5(2):1–38, 2021.