Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria (2403.08564v3)
Abstract: Generative AI, such as LLMs, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative LLMs. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational LLMs.
- U.S. Social Security Adminstration. Popular names for births in 1923-2022, 2022. [Online; accessed 16-November-2023].
- Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023.
- Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1):3–44, 2021.
- The zoo of fairness metrics in machine learning. 2021.
- Fairness in machine learning: A survey. ACM Computing Surveys, 2020.
- KENNETH WARD CHURCH. Word2vec. Natural Language Engineering, 23(1):155–162, 2017.
- The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023, 2018.
- Minimax group fairness: Algorithms and experiments. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 66–76, 2021.
- Evaluating academic answers generated using chatgpt. Journal of Chemical Education, 100(4):1672–1675, 2023.
- Fairness metrics: A comparative analysis. In 2020 IEEE International Conference on Big Data (Big Data), pages 3662–3666. IEEE, 2020.
- Evaluating fairness metrics in the presence of dataset bias. arXiv preprint arXiv:1809.09245, 2018.
- What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14):6421, 2021.
- Metrics and methods for a systematic comparison of fairness-aware machine learning algorithms. arXiv preprint arXiv:2010.03986, 2020.
- Fair prediction with endogenous behavior. In Proceedings of the 21st ACM Conference on Economics and Computation, pages 677–678, 2020.
- Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models. Advances in neural information processing systems, 34:2611–2624, 2021.
- Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807, 2016.
- Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, pages 12–24, 2023.
- Less annotating, more classifying – addressing the data scarcity issue of supervised machine learning with deep transfer learning and bert - nli, 2022.
- Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR, 2021.
- Can large language models reason about medical questions?, 2023.
- Minimax pareto fairness: A multi objective perspective. In International Conference on Machine Learning, pages 6755–6764. PMLR, 2020.
- Chatgpt knowledge evaluation in basic and clinical medical sciences: multiple choice question examination-based performance. In Healthcare, volume 11, page 2046. MDPI, 2023.
- U.S. Bureau of Labor Statistics. Labor force statistics from the current population survey. Household Data of Anual Averages, 2022.
- Evaluating the potential of llms and chatgpt on medical diagnosis and treatment. In 2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA), pages 1–9. IEEE, 2023.
- A review on fairness in machine learning. ACM Computing Surveys (CSUR), 55(3):1–44, 2022.
- The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326, 2019.
- Anaphora and coreference resolution: A review. Information Fusion, 59:139–162, 2020.
- Gender stereotypes and their impact on women’s career progressions from a manageria perspective. IIM Kozhikode Society & Management Review, 2021.
- Lauren Vogel. When people hear “doctor,” most still picture a man, 2019.
- Accuracy of information and references using chatgpt-3 for retrieval of clinical radiological information. Canadian Association of Radiologists Journal, page 08465371231171125, 2023.
- Modeling techniques for machine learning fairness: A survey. arXiv preprint arXiv:2111.03015, 2021.
- ” kelly is a warm person, joseph is a role model”: Gender biases in llm-generated reference letters. arXiv preprint arXiv:2310.09219, 2023.
- Ethical and social risks of harm from language models, 2021.
- Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876, 2018.
- Coreference resolution: A review of general methodologies and applications in the clinical domain. Journal of biomedical informatics, 44(6):1113–1122, 2011.
- Sara Sterlie (1 paper)
- Nina Weng (10 papers)
- Aasa Feragen (46 papers)