Fairness Certification for Natural Language Processing and Large Language Models
Abstract: NLP plays an important role in our daily lives, particularly due to the enormous progress of LLMs (LLM). However, NLP has many fairness-critical use cases, e.g., as an expert system in recruitment or as an LLM-based tutor in education. Since NLP is based on human language, potentially harmful biases can diffuse into NLP systems and produce unfair results, discriminate against minorities or generate legal issues. Hence, it is important to develop a fairness certification for NLP approaches. We follow a qualitative research approach towards a fairness certification for NLP. In particular, we have reviewed a large body of literature on algorithmic fairness, and we have conducted semi-structured expert interviews with a wide range of experts from that area. We have systematically devised six fairness criteria for NLP, which can be further refined into 18 sub-categories. Our criteria offer a foundation for operationalizing and testing processes to certify fairness, both from the perspective of the auditor and the audited organization.
- A. Wong, J. M. Plasek, S. P. Montecalvo, and L. Zhou, “Natural language processing and its implications for the future of medication safety: A narrative review of recent advances and challenges,” Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy, vol. 38, no. 8, pp. 822–841, 2018.
- I. Lopatovska, K. Rink, I. Knight, K. Raines, K. Cosenza, H. Williams, P. Sorsche, D. Hirsch, Q. Li, and A. Martinez, “Talk to me: Exploring user interactions with the amazon alexa,” Journal of Librarianship and Information Science, vol. 51, no. 4, pp. 984–997, 2019.
- OpenAI, “ChatGPT [large language model],” https://chat.openai.com, 2023.
- S. L. Blodgett, S. Barocas, H. Daumé III, and H. Wallach, “Language (technology) is power: A critical survey of “bias” in nlp,” in Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 5454–5476.
- N. Markl, “Language variation and algorithmic bias: understanding algorithmic bias in British English automatic speech recognition,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 521–534.
- T. Sun, A. Gaut, S. Tang, Y. Huang, M. ElSherief, J. Zhao et al., “Mitigating gender bias in natural language processing: Literature review,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 1630–1640.
- G. Stanovsky, N. A. Smith, and L. Zettlemoyer, “Evaluating gender bias in machine translation,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 1679–1684.
- A. Caliskan, J. J. Bryson, and A. Narayanan, “Semantics derived automatically from language corpora contain human-like biases,” Science, vol. 356, no. 6334, pp. 183–186, 2017.
- L. Weidinger, J. Uesato, M. Rauh, C. Griffin, P.-S. Huang, J. Mellor et al., “Taxonomy of risks posed by language models,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 214–229.
- T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai, “Man is to computer programmer as woman is to homemaker? debiasing word embeddings,” in Advances in neural information processing systems, 2016, pp. 4349–4357.
- O. Papakyriakopoulos, S. Hegelich, J. C. M. Serrano, and F. Marco, “Bias in word embeddings,” in Proceedings of the 2020 conference on fairness, accountability, and transparency, 2020, pp. 446–457.
- R. Tatman, “Gender and dialect bias in YouTube’s automatic captions,” in Proceedings of the first ACL workshop on ethics in natural language processing, 2017, pp. 53–59.
- A. Ovalle, P. Goyal, J. Dhamala, Z. Jaggers, K.-W. Chang, A. Galstyan, R. Zemel, and R. Gupta, ““i’m fully who i am”: Towards centering transgender and non-binary voices to measure biases in open language generation,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, p. 1246–1266.
- S. Hassan Awadallah, M. Huenerfauth, and C. O. Alm, “Unpacking the interdependent systems of discrimination: Ableist bias in nlp systems through an intersectional lens,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 3116–3123.
- B. Bridgeman, C. Trapani, and Y. Attali, “Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country,” Applied Measurement in Education, vol. 25, no. 1, pp. 27–40, 2012.
- S. L. Blodgett and B. O’Connor, “Racial disparity in natural language processing: A case study of social media african-american english,” arXiv preprint arXiv:1707.00061, 2017.
- M. Diaz, I. Johnson, A. Lazar, A. M. Piper, and D. Gergle, “Addressing age-related bias in sentiment analysis,” in Proceedings of the 2018 chi conference on human factors in computing systems, 2018, pp. 1–14.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- S. Garg, V. Perot, N. Limtiaco, A. Taly, E. H. Chi, and A. Beutel, “Counterfactual fairness in text classification through robustness,” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019, pp. 219–226.
- M. Danilevsky, K. Qian, R. Aharonov, Y. Katsis, B. Kawas, and P. Sen, “A survey on bias and fairness in machine learning,” arXiv preprint arXiv:2010.00711, 2020.
- B. Lepri, N. Oliver, E. Letouzé, A. Pentland, and P. Vinck, “Fair, transparent, and accountable algorithmic decision-making processes,” Philosophy & Technology, vol. 31, no. 4, pp. 611–627, 2018.
- A. Chouldechova and A. Roth, “A snapshot of the frontiers of fairness in machine learning,” Communications of the ACM, vol. 63, no. 5, pp. 82–89, 2020.
- P. Cihon, M. J. Kleinaltenkamp, J. Schuett, and S. D. Baum, “Ai certification: Advancing ethical practice by reducing information asymmetries,” IEEE Transactions on Technology and Society, vol. 2, no. 4, pp. 200–209, 2021.
- S. Verma and J. Rubin, “Fairness definitions explained,” in 2018 ACM/IEEE International Workshop on Software Fairness, 2018, pp. 1–7.
- N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learning,” ACM Computing Surveys, vol. 54, no. 6, pp. 1–35, 2021.
- A. Chouldechova, “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,” Big Data, vol. 5, no. 2, pp. 153–163, 2017.
- M. Defrance and T. De Bie, “Maximal fairness,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, p. 851–880.
- N. Kallus and A. Zhou, “Residual unfairness in fair machine learning from prejudiced data,” in International Conference on Machine Learning. PMLR, 2018, pp. 2439–2448.
- I. D. Raji, A. Smart, R. N. White, M. Mitchell, T. Gebru, B. Hutchinson, J. Smith-Loud, D. Theron, and P. Barnes, “Closing the ai accountability gap,” in Proceedings of the 2020 conference on fairness, accountability, and transparency, 2020, pp. 33–44.
- P. Adler, C. Falk, S. Friedler, T. Nix, G. Rybeck, C. Scheidegger, B. Smith, and S. Venkatasubramanian, “Auditing black-box models for indirect influence,” Knowledge and Information Systems, vol. 54, no. 1, pp. 95–122, 2018.
- C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226.
- S. Park, S. Kim, and Y. Lim, “Fairness audit of machine learning models with confidential computing,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 3488–3499.
- S. Segal, Y. Adi, B. Pinkas, C. Baum, C. Ganesh, and J. Keshet, “Fairness in the eyes of the data: Certifying machine-learning models,” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 926–935.
- R. N. Landers and T. S. Behrend, “Auditing the ai auditors: A framework for evaluating fairness and bias in high stakes ai predictive models,” American Psychologist, 2022.
- S. Costanza-Chock, I. D. Raji, and J. Buolamwini, “Who audits the auditors? recommendations from a field scan of the algorithmic auditing ecosystem,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1571–1583.
- F. Petersen, D. Mukherjee, Y. Sun, and M. Yurochkin, “Post-processing for individual fairness,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 25 944–25 955.
- C. Starke, J. Baleis, B. Keller, and F. Marcinkowski, “Fairness perceptions of algorithmic decision-making: A systematic review of the empirical literature,” Big Data & Society, vol. 9, no. 2, 2022.
- Y. Kang, Z. Cai, C.-W. Tan, Q. Huang, and H. Liu, “Natural language processing (nlp) in management research: A literature review,” Journal of Management Analytics, vol. 7, no. 2, pp. 139–172, 2020.
- E. D. Liddy, “Natural language processing,” Encyclopedia of Library and Information Science, vol. 2126, p. 2140, 2001.
- D. D. Otter, J. Medina, and J. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604–624, 2021.
- D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: State of the art, current trends and challenges,” Multimedia Tools and Applications, pp. 1–32, 2022.
- J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques,” in Third IEEE international conference on data mining, 2003, pp. 427–434.
- D. McDonald, “Natural language generation,” Handbook of Natural Language Processing, vol. 2, pp. 121–144, 2010.
- A. Matthews, I. Grasso, C. Mahoney, Y. Chen, E. Wali, T. Middleton et al., “Gender bias in natural language processing across human languages,” in Proceedings of the First Workshop on Trustworthy Natural Language Processing, 2021, pp. 45–54.
- M. Ashok, R. Madan, A. Joha, and U. Sivarajah, “Ethical framework for artificial intelligence and digital technologies,” International Journal of Information Management, vol. 62, no. 2, p. 102433, 2022.
- E. Ntoutsi, P. Fafalios, U. Gadiraju, V. Iosifidis, W. Nejdl, M.-E. Vidal et al., “Bias in data‐driven artificial intelligence systems—an introductory survey,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 10, no. 3, pp. 1–14, 2020.
- R. Wirth and J. Hipp, “CRISP-DM: Towards a standard process model for data mining,” in Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, vol. 1, 2000, pp. 29–39.
- M. Skirpan and M. Gorelick, “The authority of “fair” in machine learning,” arXiv preprint arXiv:1706.09976, 2017.
- A. Schmidt and M. Wiegand, “A survey on hate speech detection using natural language processing,” in Proceedings of the fifth international workshop on natural language processing for social media, 2017, pp. 1–10.
- A. Z. Jacobs, S. L. Blodgett, S. Barocas, H. Daumé III, and H. Wallach, “The meaning and measurement of bias: Lessons from natural language processing,” in Proceedings of the 2020 conference on fairness, accountability, and transparency, 2020, p. 706.
- R. Binns, “On the apparent conflict between individual and group fairness,” in Proceedings of the 2020 conference on fairness, accountability, and transparency, 2020, pp. 514–524.
- L. Cabello, A. K. Jørgensen, and A. Søgaard, “On the independence of association bias and empirical fairness in language models,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, p. 370–378.
- R. Bellamy, K. Dey, M. Hind, S. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilovic et al., “Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias,” IBM Journal of Research and Development, vol. 63, no. 4/5, pp. 4–1, 2019.
- S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton, and D. Roth, “A comparative study of fairness-enhancing interventions in machine learning,” in Proceedings of the conference on fairness, accountability, and transparency, 2019, pp. 329–338.
- I. Chen, F. D. Johansson, and D. Sontag, “Why is my classifier discriminatory?” in Advances in neural information processing systems, 2018, pp. 3543–3554.
- F. Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney, “Optimized pre-processing for discrimination prevention,” in Proceedings of the 31st international conference on neural information processing systems, 2017, pp. 3995–4004.
- A. Roy, J. Horstmann, and E. Ntoutsi, “Multi-dimensional discrimination in law and machine learning - a comparative overview,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, p. 89–100.
- C. Harris, M. Halevy, A. Howard, A. Bruckman, and D. Yang, “Exploring the role of grammar and word choice in bias toward african american english (aae) in hate speech classification,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 789–798.
- J. Chen, I. Berlot-Attwell, X. Wang, S. T. Hossain, and F. Rudzicz, “Exploring text specific and blackbox fairness algorithms in multimodal clinical nlp,” in Proceedings of the 3rd Clinical Natural Language Processing Workshop, 2020, pp. 301–312.
- IEEE, “Ieee standard for software reviews and audits,” IEEE Std, vol. 1028, pp. 1–53, 2008.
- S. Lins, T. Kromat, J. Löbbers, A. Benlian, and A. Sunyaev, “Why don’t you join in? a typology of information system certification adopters,” Decision Sciences, vol. 53, no. 3, pp. 452–485, 2022.
- J. Lansing, A. Benlian, and A. Sunyaev, ““unblackboxing” decision makers’ interpretations of is certifications in the context of cloud service certifications,” Journal of the Association for Information Systems, vol. 19, no. 11, pp. 1064–1096, 2018.
- N. Scharowski, M. Benk, S. J. Kühne, L. Wettstein, and F. Brühlmann, “Certification labels for trustworthy ai: Insights from an empirical mixed-method study,” in FAccT ’23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, pp. 248 – 260.
- ——, “Grounded theory research: Procedures, canons, and evaluative criteria,” Qualitative sociology, vol. 13, no. 1, pp. 3–21, 1990.
- M. Jakesch, Z. Buçinca, S. Amershi, and A. Olteanu, “How different groups prioritize ethical values for responsible ai,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 310–323.
- T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma, “Fairness-aware classifier with prejudice remover regularizer,” in Machine Learning and Knowledge Discovery in Databases. Springer, 2012, pp. 35–50.
- M. Veale and R. Binns, “Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data,” Big Data & Society, vol. 4, no. 2, p. 2053951717743530, 2017.
- M. Yurochkin and Y. Sun, “SenSeI: Sensitive set invariance for enforcing individual fairness,” arXiv preprint arXiv:2006.14168, 2020.
- C. Dwork, N. Immorlica, A. T. Kalai, and M. Leiserson, “Decoupled classifiers for group-fair and efficient machine learning,” in Conference on fairness, accountability and transparency, 2018, pp. 119–133.
- F. Kamiran, A. Karim, and X. Zhang, “Decision theory for discrimination-aware classification,” in 2012 IEEE 12th International Conference on Data Mining, 2012, pp. 924–929.
- S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq, “Algorithmic decision making and the cost of fairness,” in Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining, 2017, pp. 797–806.
- M. J. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual fairness,” in Proceedings of the 31st international conference on neural information processing systems, 2017, pp. 4066–4076.
- M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, and T. Gebru, “Model cards for model reporting,” in Proceedings of the conference on fairness, accountability, and transparency, 2019, pp. 220–229.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.
- M. Pushkarna, A. Zaldivar, and O. Kjartansson, “Data cards: Purposeful and transparent dataset documentation for responsible AI,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1776–1826.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.