Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent (2404.06717v1)
Abstract: Racial diversity has become increasingly discussed within the AI and algorithmic fairness literature, yet little attention is focused on justifying the choices of racial categories and understanding how people are racialized into these chosen racial categories. Even less attention is given to how racial categories shift and how the racialization process changes depending on the context of a dataset or model. An unclear understanding of \textit{who} comprises the racial categories chosen and \textit{how} people are racialized into these categories can lead to varying interpretations of these categories. These varying interpretations can lead to harm when the understanding of racial categories and the racialization process is misaligned from the actual racialization process and racial categories used. Harm can also arise if the racialization process and racial categories used are irrelevant or do not exist in the context they are applied. In this paper, we make two contributions. First, we demonstrate how racial categories with unclear assumptions and little justification can lead to varying datasets that poorly represent groups obfuscated or unrepresented by the given racial categories and models that perform poorly on these groups. Second, we develop a framework, CIRCSheets, for documenting the choices and assumptions in choosing racial categories and the process of racialization into these categories to facilitate transparency in understanding the processes and assumptions made by dataset or model developers when selecting or using these racial categories.
- An empirical analysis of racial categories in the algorithmic fairness literature. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1324–1333, 2023.
- Demographic-reliant algorithmic fairness: Characterizing the risks of demographic data collection in the pursuit of fairness. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1709–1721, 2022.
- Machine bias. In Ethics of data and analytics, pages 254–264. Auerbach Publications, 2022.
- Lack of arab or middle eastern and north african health data undermines assessment of health disparities. American Journal of Public Health, 112(2):209–212, 2022.
- Darwin A Baluran. Life expectancy, life disparity, and differential racialization among chinese, asian indians, and filipinos in the united states. SSM-Population Health, 21:101306, 2023.
- The relative contributions of race/ethnicity, socioeconomic status, health, and social relationships to life satisfaction in the united states. Quality of Life Research, 18:179–189, 2009.
- A theory of immigration and racial stratification. Journal of Black Studies, 27(5):668–682, 1997.
- Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6:587–604, 2018.
- Racial categories in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 289–298, 2019.
- “more than skin deep”: stress neurobiology and mental health consequences of racial discrimination. Stress, 18(1):1–10, 2015.
- Fairlearn: A toolkit for assessing and improving fairness in ai. Microsoft, Tech. Rep. MSR-TR-2020-32, 2020.
- Whose tweets are surveilled for the police: an audit of a social-media monitoring tool via log files. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 570–580, 2020.
- Racial discrimination and health outcomes among racial/ethnic minorities: A meta-analytic review. Journal of Multicultural Counseling and Development, 45(4):232–259, 2017.
- Kim D Chanbonpin. Between black and white: The coloring of asian americans. Wash. U. Global Stud. L. Rev., 14:637, 2015.
- The dataset nutrition label (2nd gen): Leveraging context to mitigate harms in artificial intelligence. arXiv preprint arXiv:2201.03954, 2022.
- Sociology of racism. The international encyclopedia of the social and behavioral sciences, 19(2015):857–63, 2015.
- Interactive model cards: A human-centered approach to model documentation. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 427–439, 2022.
- A panel of ancestry informative markers for the complex five-way admixed south african coloured population. PloS one, 8(12):e82224, 2013.
- Crowdworksheets: Accounting for individual and collective identities underlying crowdsourced dataset annotation. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 2342–2351, 2022.
- Comparing the racialization of central-east european migrants in japan and the uk. Comparative Migration Studies, 9(1):1–17, 2021.
- Karen Farquharson. Racial categories in three nations: Australia, south africa and the united states. In Proceedings of ‘Public sociologies: lessons and trans-Tasman Comparisons’, the Annual Conference of The Australian Sociological Association (TASA), 2007.
- Monoracial normativity in university websites: Systematic erasure and selective reclassification of multiracial students. Journal of Diversity in Higher Education, 14(2):252, 2021.
- A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 329–338, 2019.
- Datasheets for datasets. Communications of the ACM, 64(12):86–92, 2021.
- Towards a critical race methodology in algorithmic fairness. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 501–512, 2020.
- Tanya Katerí Hernández. Racial innocence: Unmasking Latino anti-Black bias and the struggle for equality. Beacon Press, 2022.
- Racial reorganization and the united states census 1850–1930: Mulattoes, half-breeds, mixed parentage, hindoos, and the mexican race. Studies in American Political Development, 22(1):59–96, 2008.
- The dataset nutrition label. Data Protection and Privacy, 12(12):1, 2020.
- Out of the shadows, into the dark: Ethnoracial dissonance and identity formation among afro-latinxs. Sociology of Race and Ethnicity, 6(2):146–160, 2020.
- Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 560–575, 2021.
- Yasmiyn Irizarry. Utilizing multidimensional measures of race in education research: The case of teacher perceptions. Sociology of Race and Ethnicity, 1(4):564–583, 2015.
- Race-shifting in the united states: Latinxs, skin tone, and ethnoracial alignments. Sociology of Race and Ethnicity, 9(1):37–55, 2023.
- Immigrant incorporation and racial identity: Racial self-identification among dominican immigrants. Ethnic and Racial Studies, 28(1):50–78, 2005.
- Introducing the gab hate corpus: defining and applying hate-based rhetoric to social media posts at scale. Language Resources and Evaluation, pages 1–30, 2022.
- Towards unbiased and accurate deferral to multiple experts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 154–165, 2021.
- Nikki Khanna. “if you’re half black, you’re just black”: Reflected appraisals and the persistence of the one-drop rule. The Sociological Quarterly, 51(1):96–121, 2010.
- Michelle Seng Ah Lee and Jat Singh. The landscape and gaps in open source fairness toolkits. In Proceedings of the 2021 CHI conference on human factors in computing systems, pages 1–13, 2021.
- Sharon M Lee. Racial classifications in the us census: 1890–1990. Ethnic and racial studies, 16(1):75–94, 1993.
- Nancy López. Killing two birds with one stone? why we need two separate questions on race and ethnicity in the 2020 census and beyond. Latino Studies, 11:428–438, 2013.
- Middle eastern and north african americans may not be perceived, nor perceive themselves, to be white. Proceedings of the National Academy of Sciences, 119(7):e2117940119, 2022.
- Collecting and tabulating ethnicity and race responses in the 2020 census. United States Census Bureau, 2020.
- Kay Young McChesney. Teaching diversity: The science you need to know to explain why race is not biological. SAGE Open, 5(4):2158244015611712, 2015.
- Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pages 220–229, 2019.
- Beholding inequality: Race, gender, and returns to physical attractiveness in the united states. American Journal of Sociology, 127(1):194–241, 2021.
- Who identifies as “latinx”? the generational politics of ethnoracial labels. Social Forces, 100(3):1170–1194, 2022.
- How real is race?: A sourcebook on race, culture, and biology. Rowman & Littlefield, 2013.
- Racialization: Studies in theory and practice. Oxford University Press, USA, 2005.
- Laura Beth Nielsen. Subtle, pervasive, harmful: Racist and sexist remarks in public as hate speech. Journal of Social issues, 58(2):265–280, 2002.
- Family-based caregiving: Does lumping asian americans together do more harm than good? Journal of Social, Behavioral, and Health Sciences, 15(1):87–102, 2021.
- Anthony C Ocampo. ” am i really asian?”: Educational experiences and panethnic identification among second–generation filipino americans. Journal of Asian American Studies, 16(3):295–324, 2013.
- Anthony Christian Ocampo. The Latinos of Asia: How Filipino Americans break the rules of race. Stanford University Press, 2016.
- Panethnicity. Annual Review of Sociology, 40(1):219–239, 2014.
- Examining the inclusion of race and ethnicity in patient cases. American journal of pharmaceutical education, 85(9):8583, 2021.
- Eric Steven O’Malley. Irreconcilable rights and the question of hawaiian statehood. Geo. LJ, 89:501, 2000.
- Racial formation in the United States. Routledge, 2014.
- Data cards: Purposeful and transparent dataset documentation for responsible ai. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1776–1826, 2022.
- The distinct impacts of race and genetic ancestry on health. Nature medicine, 28(5):890–893, 2022.
- Finally, someone who “gets” me! multiracial people value others’ accuracy about their race. Cultural Diversity and Ethnic Minority Psychology, 19(4):453, 2013.
- Clara E Rodriguez. Changing race: Latinos, the census, and the history of ethnicity in the United States, volume 41. NYU Press, 2000.
- Wendy D Roth. Racial mismatch: The divergence between form and function in data for monitoring racial discrimination of hispanics. Social Science Quarterly, 91(5):1288–1311, 2010.
- Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry, 22(2014):4349–4357, 2014.
- Paul Schor. Counting Americans : how the US Census classified the nation. Oxford University Press, New York, NY, 2017.
- Gail Shuck. Racializing the nonnative english speaker. Journal of Language, Identity, and Education, 5(4):259–276, 2006.
- Edward Telles. Latinos, race, and the us census. The ANNALS of the American Academy of Political and Social Science, 677(1):153–164, 2018.
- Who is black, white, or mixed race? how skin color, status, and nation shape racial classification in latin america. American Journal of Sociology, 120(3):864–907, 2014.
- Fernando M Treviño. Standardized terminology for hispanic populations. American Journal of Public Health, 77(1):69–72, 1987.
- Ekeoma E Uzogara. Who belongs in america? latinxs’ skin tones, perceived discrimination, and opposition to multicultural policies. Cultural diversity and ethnic minority psychology, 27(3):354, 2021.
- Algorithmic auditing and social justice: Lessons from the history of audit studies. In Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–9. 2021.
- Racial and ethnic disparities in health and health care. Obstetrics and Gynecology Clinics, 44(1):1–11, 2017.
- Evidence for hypodescent in visual semantic ai. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1293–1304, 2022.
- Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 547–558, 2020.
- Henry Yu. 27 Ethnicity, pages 106–110. New York University Press, New York, USA, 2020.
- Jennifer Mickel (3 papers)