Language Fairness in Multilingual Information Retrieval (2405.00978v1)
Abstract: Multilingual information retrieval (MLIR) considers the problem of ranking documents in several languages for a query expressed in a language that may differ from any of those languages. Recent work has observed that approaches such as combining ranked lists representing a single document language each or using multilingual pretrained LLMs demonstrate a preference for one language over others. This results in systematic unfair treatment of documents in different languages. This work proposes a language fairness metric to evaluate whether documents across different languages are fairly ranked through statistical equivalence testing using the Kruskal-Wallis test. In contrast to most prior work in group fairness, we do not consider any language to be an unprotected group. Thus our proposed measure, PEER (Probability of EqualExpected Rank), is the first fairness metric specifically designed to capture the language fairness of MLIR systems. We demonstrate the behavior of PEER on artificial ranked lists. We also evaluate real MLIR systems on two publicly available benchmarks and show that the PEER scores align with prior analytical findings on MLIR fairness. Our implementation is compatible with ir-measures and is available at http://github.com/hltcoe/peer_measure.
- Tip of the tongue known-item retrieval: A case study in movie identification. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval. 5–14.
- Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016).
- Equity of attention: Amortizing individual fairness in rankings. In Proc. SIGIR. 405–414.
- Martin Braschler. 2001. CLEF 2001—Overview of Results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 9–26.
- Martin Braschler. 2002. CLEF 2002—Overview of results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 9–27.
- Martin Braschler. 2003. CLEF 2003–Overview of results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 44–63.
- Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM New York, NY, USA, 3–10.
- Carlos Castillo. 2019. Fairness and transparency in ranking. In ACM SIGIR Forum, Vol. 52. ACM New York, NY, USA, 64–71.
- Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. 621–630.
- Reading Wikipedia to answer open-domain questions. In 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017. Association for Computational Linguistics (ACL), 1870–1879.
- Monojit Choudhury and Amit Deshpande. 2021. How Linguistically Fair Are Multilingual Pre-Trained Language Models?. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 12710–12718.
- TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages. Transactions of the Association for Computational Linguistics (2020).
- Novelty and diversity in information retrieval evaluation. In SIGIR.
- Evaluating stochastic rankings with expected exposure. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 275–284.
- Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.
- Soft Prompt Decoding for Multilingual Dense Retrieval. arXiv preprint arXiv:2305.09025 (2023).
- Multilingual LAMA: Investigating knowledge in multilingual pretrained language models. arXiv preprint arXiv:2102.00894 (2021).
- William H Kruskal and W Allen Wallis. 1952. Use of ranks in one-criterion variance analysis. Journal of the American statistical Association 47, 260 (1952), 583–621.
- Controlling information aggregation for complex question answering. In Advances in Information Retrieval: 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings 40. Springer, 750–757.
- Overview of the TREC 2022 NeuCLIR Track. In Proceedings of the 31st Text REtrieval Conference (Gaithersburg, Maryland). https://arxiv.org/abs/2304.12367
- Neural Approaches to Multilingual Information Retrieval. In European Conference on Information Retrieval. Springer, 521–536.
- Known-item search: Variations on a concept. Proceedings of the american society for information science and technology 43, 1 (2006), 1–17.
- Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) 27, 1 (2008), 1–27.
- Controlling fairness and bias in dynamic learning-to-rank. In Proc. SIGIR. 429–438.
- Multilingual information retrieval in the language modeling framework. Information Retrieval Journal 18, 3 (2015), 246–281.
- Stephen E Robertson and Ian Soboroff. 2002. The TREC 2002 Filtering Track Report.. In Proceedings of the Eleventh Text Retrieval Conference.
- ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3715–3734.
- Quantifying the impact of user attention on fair group representation in ranked lists. In Companion Proceedings of WWW. 553–562.
- Biased information search in group decision making. Journal of personality and social psychology 78, 4 (2000), 655.
- Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proc. KDD.
- Ryen White. 2013. Beliefs and biases in web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 3–12.
- Fa* ir: A fair top-k ranking algorithm. In Proc. of CIKM.
- Fair Top-k Ranking with multiple protected groups. Information processing & management 59, 1 (2022), 102707.
- Fairness in ranking: A survey. arXiv preprint arXiv:2103.14000 (2021).
- Eugene Yang (38 papers)
- Thomas Jänich (1 paper)
- James Mayfield (21 papers)
- Dawn Lawrie (31 papers)