AI Gender Bias, Disparities, and Fairness: Does Training Data Matter? (2312.10833v2)
Abstract: This study delves into the pervasive issue of gender issues in AI, specifically within automatic scoring systems for student-written responses. The primary objective is to investigate the presence of gender biases, disparities, and fairness in generally targeted training samples with mixed-gender datasets in AI scoring outcomes. Utilizing a fine-tuned version of BERT and GPT-3.5, this research analyzes more than 1000 human-graded student responses from male and female participants across six assessment items. The study employs three distinct techniques for bias analysis: Scoring accuracy difference to evaluate bias, mean score gaps by gender (MSG) to evaluate disparity, and Equalized Odds (EO) to evaluate fairness. The results indicate that scoring accuracy for mixed-trained models shows an insignificant difference from either male- or female-trained models, suggesting no significant scoring bias. Consistently with both BERT and GPT-3.5, we found that mixed-trained models generated fewer MSG and non-disparate predictions compared to humans. In contrast, compared to humans, gender-specifically trained models yielded larger MSG, indicating that unbalanced training data may create algorithmic models to enlarge gender disparities. The EO analysis suggests that mixed-trained models generated more fairness outcomes compared with gender-specifically trained models. Collectively, the findings suggest that gender-unbalanced data do not necessarily generate scoring bias but can enlarge gender disparities and reduce scoring fairness.
- AI in education: Learner choice and fundamental rights. Learning, Media and Technology 45, 3 (2020), 312–324.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems 29 (2016).
- Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ digital medicine 3, 1 (2020), 81.
- Cathrine V Felix. 2020. The role of the teacher and AI in education. In International perspectives on the role of technology in humanizing higher education. Emerald Publishing Limited, 33–48.
- Organisation for Economic Co-operation and Development (OECD). 2018. Bridging the digital gender divide: Include, upskill, innovate. OECD (2018).
- Valentina Franzoni. 2023. Gender Differences and Bias in Artificial Intelligence. In Gender in AI and Robotics: The Gender Challenges from an Interdisciplinary Perspective. Springer, 27–43.
- Artificial intelligence for student assessment: A systematic review. Applied Sciences 11, 12 (2021), 5467.
- Paula Hall and Debbie Ellis. 2023. A systematic review of socio-technical gender bias in AI algorithms. Online Information Review (2023).
- Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016).
- Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Education (2021), 1–23.
- Kenneth Holstein and Shayan Doroudi. 2019. Fairness and equity in learning analytics systems (FairLAK). In Companion proceedings of the ninth international learning analytics & knowledge conference (LAK 2019). 1–2.
- Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences 117, 23 (2020), 12592–12594.
- Artificial general intelligence (AGI) for education. arXiv preprint arXiv:2304.12479 (2023).
- Ehsan Latif and Xiaoming Zhai. 2023a. Automatic Scoring of Students’ Science Writing Using Hybrid Neural Network. arXiv preprint arXiv:2312.03752 (2023).
- Ehsan Latif and Xiaoming Zhai. 2023b. Fine-tuning chatgpt for automatic scoring. arXiv preprint arXiv:2310.10072 (2023).
- Susan Leavy. 2018. Gender bias in artificial intelligence: The need for diversity and gender theory in machine learning. In Proceedings of the 1st international workshop on gender equality in software engineering. 14–16.
- Applying Large Language Models and Chain-of-Thought for Automatic Scoring. arXiv preprint arXiv:2312.03748 (2023).
- Multimodality of AI for Education: Towards Artificial General Intelligence. arXiv preprint arXiv:2312.06037 (2023).
- Using fair AI to predict students’ math learning outcomes in an online platform. Interactive Learning Environments (2022), 1–20.
- Gender bias in artificial intelligence. Journal of Telecommunications and the Digital Economy 11, 2 (2023), 8–30.
- Sentence part-enhanced BERT with respect to downstream tasks. Complex & Intelligent Systems 9, 1 (2023), 463–474.
- Gender bias in neural natural language processing. Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday (2020), 189–202.
- Beyond “fairness”: Structural (in) justice lenses on ai for education. In The ethics of artificial intelligence in education. Routledge, 203–239.
- Cristina Manresa-Yee and Silvia Ramis. 2021. Assessing gender bias in predictive algorithms using explainable AI. In Proceedings of the XXI International Conference on Human Computer Interaction. 1–8.
- Bernice D Mowery. 2011. The paired t-test. Pediatric nursing 37, 6 (2011), 320–322.
- Gender Bias in AI: A review of contributing factors and mitigating strategies. (2020).
- Gender bias in AI-based decision-making systems: a systematic literature review. Australasian Journal of Information Systems 26 (2022).
- Gender Bias in Transformer Models: A comprehensive survey. arXiv preprint arXiv:2306.10530 (2023).
- Sinead O’Connor and Helen Liu. 2023. Gender bias perpetuation and mitigation in AI technologies: challenges and opportunities. AI & SOCIETY (2023), 1–13.
- Junaid Qadir. 2023. Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In 2023 IEEE Global Engineering Education Conference (EDUCON). IEEE, 1–9.
- Zouhaier Slimi. 2023. Navigating the Ethical Challenges of Artificial Intelligence in Higher Education: An Analysis of Seven Global AI Ethics Policies. (2023).
- Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976 (2019).
- Jinhao Wang and Michelle Stallone Brown. 2007. Automated essay scoring versus human scoring: A comparative study. Journal of technology, Learning, and assessment 6, 2 (2007), n2.
- Using automated analysis to assess middle school students’ competence with scientific argumentation. Journal of Research in Science Teaching (2023), 1–32. https://doi.org/10.1002/tea.21864
- Xiaoming Zhai. 2021. Advancing automatic guidance in virtual science inquiry: From ease of use to personalization. Educational Technology Research and Development 69, 1 (2021), 255–258. https://doi.org/DOI:10.1007/s11423-020-09917-8
- A Review of Artificial Intelligence (AI) in Education from 2010 to 2020. Complexity 2021 (2021), 1–18.
- From substitution to redefinition: A framework of machine learning-based science assessment. Journal of Research in Science Teaching 57, 9 (2020), 1430–1459. https://doi.org/10.1002/tea.21658
- Applying machine learning to automatically assess scientific models. Journal of Research in Science Teaching 59, 10 (2022), 1765–1794.
- Xiaoming Zhai and Joseph Krajcik. 2022. Pseudo AI bias. arXiv preprint arXiv:2210.08141 (2022).
- Xiaoming Zhai and Ross H Nehm. 2023. AI and formative assessment: The train has left the station. Journal of Research in Science Teaching (2023).
- A meta-analysis of machine learning-based science assessments: Factors impacting machine-human score agreements. Journal of Science Education and Technology 30 (2021), 361–379.
- Applying machine learning in science assessment: a systematic review. Studies in Science Education 56, 1 (2020), 111–151.