A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models (2403.12025v2)
Abstract: LLMs hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases, and EquityMedQA, a collection of seven datasets enriched for adversarial queries. Both our human assessment framework and dataset design process are grounded in an iterative participatory approach and review of Med-PaLM 2 answers. Through our empirical study, we find that our approach surfaces biases that may be missed via narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. While our approach is not sufficient to holistically assess whether the deployment of an AI system promotes equitable health outcomes, we hope that it can be leveraged and built upon towards a shared goal of LLMs that promote accessible and equitable healthcare.
- “The future landscape of large language models in medicine” In Communications medicine 3.1 Nature Publishing Group UK London, 2023, pp. 141
- “Large Language Models in Medicine: The Potentials and Pitfalls: A Narrative Review” In Annals of Internal Medicine 177.2 American College of Physicians, 2024, pp. 210–220
- “Large Language Models Encode Clinical Knowledge” In Nature 620.7972 Nature Publishing Group UK London, 2023, pp. 172–180
- “Towards Expert-Level Medical Question Answering with Large Language Models”, 2023 arXiv:2305.09617
- “Almanac—Retrieval-augmented language models for clinical medicine” In NEJM AI 1.2 Massachusetts Medical Society, 2024, pp. AIoa2300068
- “A Large Language Model for Electronic Health Records” In NPJ Digital Medicine 5.1 Nature Publishing Group UK London, 2022, pp. 194
- “Large language models are few-shot clinical information extractors” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
- Zahir Kanjee, Byron Crowe and Adam Rodman “Accuracy of a generative artificial intelligence model in a complex diagnostic challenge” In Jama 330.1 American Medical Association, 2023, pp. 78–80
- “Towards accurate differential diagnosis with large language models” In arXiv preprint arXiv:2312.00164, 2023
- “Towards conversational diagnostic ai” In arXiv preprint arXiv:2401.05654, 2024
- “Med-Flamingo: A multimodal medical few-shot learner” In Machine Learning for Health (ML4H), 2023, pp. 353–367 PMLR
- “Towards generalist biomedical ai” In NEJM AI 1.3 Massachusetts Medical Society, 2024, pp. AIoa2300138
- “Consensus, Dissensus and Synergy between Clinicians and Specialist Foundation Models in Radiology Report Generation”, 2023 DOI: 10.21203/rs.3.rs-3940387/v1
- “Large language models are few-shot health learners” In arXiv preprint arXiv:2305.15525, 2023
- “ChatGPT: Promise and challenges for deployment in low-and middle-income countries” In The Lancet Regional Health–Western Pacific 41 Elsevier, 2023
- “Artificial intelligence and the future of global health” In The Lancet 395.10236 Elsevier, 2020, pp. 1579–1586
- Stefan Harrer “Attention is not all you need: The complicated case of ethically using large language models in healthcare and medicine” In EBioMedicine 90 Elsevier, 2023
- “Centering health equity in large language model deployment” In PLOS Digital Health 2.10 Public Library of Science San Francisco, CA USA, 2023, pp. e0000367
- Peter Lee, Sebastien Bubeck and Joseph Petro “Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine” In New England Journal of Medicine 388.13 Mass Medical Soc, 2023, pp. 1233–1239
- “On the dangers of stochastic parrots: Can language models be too big?” In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, 2021, pp. 610–623
- Geoff Keeling “Algorithmic bias, generalist models, and clinical medicine” In AI and Ethics Springer, 2023, pp. 1–12
- Julia Adler-Milstein, Donald A. Redelmeier and Robert M. Wachter “The Limits of Clinician Vigilance as an AI Safety Bulwark” In JAMA, 2024 DOI: 10.1001/jama.2024.3620
- Bertalan Meskó and Eric J Topol “The imperative for regulatory oversight of large language models (or generative AI) in healthcare” In NPJ Digital Medicine 6.1 Nature Publishing Group UK London, 2023, pp. 120
- “The shaky foundations of large language models and foundation models for electronic health records” In NPJ Digital Medicine 6.1 Nature Publishing Group UK London, 2023, pp. 135
- “Structural Racism and Health Inequities in the USA: Evidence and Interventions” In The Lancet 389.10077 Elsevier, 2017, pp. 1453–1463 DOI: 10.1016/S0140-6736(17)30569-X
- “Understanding How Discrimination Can Affect Health” In Health Services Research 54.S2, 2019, pp. 1374–1388 DOI: 10.1111/1475-6773.13222
- World Health Organization “A Conceptual Framework for Action on the Social Determinants of Health”, Discussion Paper Series on Social Determinants of Health, 2 Geneva: World Health Organization, 2010, pp. 76
- World Health Organization “Operational Framework for Monitoring Social Determinants of Health Equity”, 2024
- “The Value of Standards for Health Datasets in Artificial Intelligence-Based Applications” In Nature Medicine 29.11 Nature Publishing Group, 2023, pp. 2929–2938 DOI: 10.1038/s41591-023-02608-w
- “Racial Underrepresentation in Dermatological Datasets Leads to Biased Machine Learning Models and Inequitable Healthcare” In Journal of Biomed Research 3.1 NIH Public Access, 2022, pp. 42
- “A Causal Perspective on Dataset Bias in Machine Learning for Medical Imaging” In Nature Machine Intelligence 6.2 Nature Publishing Group, 2024, pp. 138–146 DOI: 10.1038/s42256-024-00797-8
- Kadija Ferryman, Maxine Mackintosh and Marzyeh Ghassemi “Considering Biased Data as Informative Artifacts in AI-Assisted Health Care” In New England Journal of Medicine 389.9 Massachusetts Medical Society, 2023, pp. 833–838 DOI: 10.1056/NEJMra2214964
- “Large Language Models Propagate Race-Based Medicine” In NPJ Digital Medicine 6, 2023, pp. 195 DOI: 10.1038/s41746-023-00939-z
- “Health Inequities and the Inappropriate Use of Race in Nephrology” In Nature Reviews. Nephrology 18.2, 2022, pp. 84–94 DOI: 10.1038/s41581-021-00501-8
- “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations” In Science 366.6464 American Association for the Advancement of Science, 2019, pp. 447–453 DOI: 10.1126/science.aax2342
- “Participatory Problem Formulation for Fairer Machine Learning Through Community Based System Dynamics” arXiv, 2020 DOI: 10.48550/arXiv.2005.07572
- “Problem Formulation and Fairness” In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19 New York, NY, USA: Association for Computing Machinery, 2019, pp. 39–48 DOI: 10.1145/3287560.3287567
- “Ethical Machine Learning in Healthcare” In Annual Review of Biomedical Data Science 4.1, 2021, pp. 123–144 DOI: 10.1146/annurev-biodatasci-092820-114757
- Stephen R. Pfohl, Agata Foryciarz and Nigam H. Shah “An Empirical Characterization of Fair Machine Learning for Clinical Risk Prediction” In Journal of Biomedical Informatics 113, 2021, pp. 103621 DOI: 10.1016/j.jbi.2020.103621
- Tiffany C Veinot, Hannah Mitchell and Jessica S Ancker “Good Intentions Are Not Enough: How Informatics Interventions Can Worsen Inequality” In Journal of the American Medical Informatics Association 25.8, 2018, pp. 1080–1088 DOI: 10.1093/jamia/ocy052
- “Assessing the Potential of GPT-4 to Perpetuate Racial and Gender Biases in Health Care: A Model Evaluation Study” In The Lancet Digital Health 6.1 Elsevier, 2024, pp. e12–e22 DOI: 10.1016/S2589-7500(23)00225-X
- Ruha Benjamin “Race after technology: Abolitionist tools for the new Jim code” Oxford University Press, 2020
- “Red-Teaming for Generative AI: Silver Bullet or Security Theater?” arXiv, 2024 DOI: 10.48550/arXiv.2401.15897
- “Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned” In arXiv preprint arXiv:2209.07858, 2022
- “Red teaming language models with language models” In arXiv preprint arXiv:2202.03286, 2022
- “The Medical Algorithmic Audit” In The Lancet Digital Health 4.5 Elsevier, 2022, pp. e384–e397 DOI: 10.1016/S2589-7500(22)00003-6
- “Targeted Validation: Validating Clinical Prediction Models in Their Intended Population and Setting” In Diagnostic and Prognostic Research 6.1, 2022, pp. 24 DOI: 10.1186/s41512-022-00136-8
- “Foundation models for generalist medical artificial intelligence” In Nature 616.7956 Nature Publishing Group UK London, 2023, pp. 259–265
- “MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records” arXiv, 2023 arXiv:2308.14089
- Ankit Pal, Logesh Kumar Umapathi and Malaikannan Sankarasubbu “Med-HALT: Medical Domain Hallucination Test for Large Language Models” In Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), 2023, pp. 314–334
- “Testing the Limits of Language Models: A Conversational Framework for Medical AI Assessment” In medRxiv Cold Spring Harbor Laboratory Press, 2023
- “A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis” In ICML 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH), 2023
- “ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image Using Large Language Models” arXiv, 2023 DOI: 10.48550/arXiv.2302.07257
- Giorgio Leonardi, Luigi Portinale and Andrea Santomauro “Enhancing Medical Image Report Generation through Standard Language Models: Leveraging the Power of LLMs in Healthcare” In 2nd AIxIA Workshop on Artificial Intelligence for Healthcare, 2023
- “Adapted large language models can outperform medical experts in clinical text summarization” In Nature Medicine Nature Publishing Group US New York, 2024, pp. 1–9
- “Multimodal LLMs for Health Grounded in Individual-Specific Data” In Machine Learning for Multimodal Healthcare Data, 2024, pp. 86–102 DOI: 10.1007/978-3-031-47679-2_7
- “Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods Study” arXiv, 2024 DOI: 10.48550/arXiv.2401.09637
- “Human–AI Collaboration Enables More Empathic Conversations in Text-Based Peer-to-Peer Mental Health Support” In Nature Machine Intelligence 5.1 Nature Publishing Group UK London, 2023, pp. 46–57
- World Health Organization “Health Equity”, https://www.who.int/health-topics/health-equity
- “Use Large Language Models to Promote Equity” arXiv, 2023 DOI: 10.48550/arXiv.2312.14804
- Emma Gurevich, Basheer El Hassan and Christo El Morr “Equity within AI Systems: What Can Health Leaders Expect?” In Healthcare Management Forum 36.2, 2023, pp. 119–124 DOI: 10.1177/08404704221125368
- Irene Y. Chen, Peter Szolovits and Marzyeh Ghassemi “Can AI Help Reduce Disparities in General Medical and Mental Health Care?” In AMA Journal of Ethics 21.2 American Medical Association, 2019, pp. 167–179 DOI: 10.1001/amajethics.2019.167
- “The Root Causes of Health Inequity” In Communities in Action: Pathways to Health Equity National Academies Press, 2017
- “The State of Health Disparities in the United States” In Communities in Action: Pathways to Health Equity National Academies Press (US), 2017
- Dielle J Lundberg and Jessica A Chen “Structural ableism in public health and healthcare: a definition and conceptual framework” In The Lancet Regional Health–Americas 30 Elsevier, 2024
- Elizabeth Brondolo, Linda C. Gallo and Hector F. Myers “Race, Racism and Health: Disparities, Mechanisms, and Interventions” In Journal of Behavioral Medicine 32.1, 2009, pp. 1–8 DOI: 10.1007/s10865-008-9190-3
- “Socioeconomic Disparities in Health in the United States: What the Patterns Tell Us” In American Journal of Public Health 100.S1 American Public Health Association, 2010, pp. S186–S196 DOI: 10.2105/AJPH.2009.166082
- Stella M. Umuhoza and John E. Ataguba “Inequalities in Health and Health Risk Factors in the Southern African Development Community: Evidence from World Health Surveys” In International Journal for Equity in Health 17, 2018, pp. 52 DOI: 10.1186/s12939-018-0762-8
- Hyacinth Eme Ichoku, Gavin Mooney and John Ele-Ojo Ataguba “Africanizing the Social Determinants of Health: Embedded Structural Inequalities and Current Health Outcomes in Sub-Saharan Africa” In International Journal of Health Services 43.4 SAGE Publications Inc, 2013, pp. 745–759 DOI: 10.2190/HS.43.4.i
- Yarlini Balarajan, S Selvaraj and S V Subramanian “Health Care and Equity in India” In Lancet 377.9764, 2011, pp. 505–515 DOI: 10.1016/S0140-6736(10)61894-6
- “Health Inequity in Workers of Latin America and the Caribbean” In International Journal for Equity in Health 19.1, 2020, pp. 109 DOI: 10.1186/s12939-020-01228-x
- “Sources of Bias in Artificial Intelligence That Perpetuate Healthcare Disparities—A Global Review” In PLOS Digital Health 1.3 Public Library of Science, 2022, pp. e0000022 DOI: 10.1371/journal.pdig.0000022
- Solon Barocas, Moritz Hardt and Arvind Narayanan “Fairness and Machine Learning: Limitations and Opportunities” MIT Press, 2023
- “Considerations for Addressing Bias in Artificial Intelligence for Health Equity” In NPJ Digital Medicine 6, 2023, pp. 170 DOI: 10.1038/s41746-023-00913-9
- “Guiding Principles to Address the Impact of Algorithm Bias on Racial and Ethnic Disparities in Health and Health Care” In JAMA network open 6.12, 2023, pp. e2345050 DOI: 10.1001/jamanetworkopen.2023.45050
- “Mitigating Racial And Ethnic Bias And Advancing Health Equity In Clinical Algorithms: A Scoping Review” In Health Affairs 42.10 Health Affairs, 2023, pp. 1359–1368 DOI: 10.1377/hlthaff.2023.00553
- “Net Benefit, Calibration, Threshold Selection, and Training Objectives for Algorithmic Fairness in Healthcare” In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22 New York, NY, USA: Association for Computing Machinery, 2022, pp. 1039–1052 DOI: 10.1145/3531146.3533166
- “Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings” In Proceedings of the ACM Conference on Health, Inference, and Learning, CHIL ’20 New York, NY, USA: Association for Computing Machinery, 2020, pp. 110–120 DOI: 10.1145/3368555.3384448
- World Health Organization “WHO Releases AI Ethics and Governance Guidance for Large Multi-Modal Models”, x https://www.who.int/news/item/18-01-2024-who-releases-ai-ethics-and-governance-guidance-for-large-multi-modal-models, 2024
- “Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT”, 2023 DOI: 10.1101/2023.08.28.23294730
- “Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction” In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23 New York, NY, USA: Association for Computing Machinery, 2023, pp. 723–741 DOI: 10.1145/3600211.3604673
- “Sociotechnical Safety Evaluation of Generative AI Systems” arXiv, 2023 DOI: 10.48550/arXiv.2310.11986
- “A Normative Framework for Artificial Intelligence as a Sociotechnical System in Healthcare” In Patterns 4.11 Elsevier, 2023 DOI: 10.1016/j.patter.2023.100864
- “Undesirable Biases in NLP: Addressing Challenges of Measurement” In Journal of Artificial Intelligence Research 79, 2024, pp. 1–40 DOI: 10.1613/jair.1.15195
- “DICES Dataset: Diversity in Conversational AI Evaluation for Safety” In Advances in Neural Information Processing Systems 36, 2023, pp. 53330–53342
- “Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety” arXiv, 2023 DOI: 10.48550/arXiv.2306.11530
- “The Reasonable Effectiveness of Diverse Evaluation Data” arXiv, 2023 DOI: 10.48550/arXiv.2301.09406
- “A Framework to Assess (Dis)Agreement Among Diverse Rater Groups” arXiv, 2023 DOI: 10.48550/arXiv.2311.05074
- “The Equitable AI Research Roundtable (EARR): Towards Community-Based Decision Making in Responsible AI Development” arXiv, 2023 DOI: 10.48550/arXiv.2303.08177
- “An Equity-Based Taxonomy for Generative AI: Utilizing Participatory Research to Advance Methods of Evaluation for Equity and Sensitive Domains” In Working paper in submission, 2024
- “Learning to Summarize with Human Feedback” In Advances in Neural Information Processing Systems 33, 2020, pp. 3008–3021
- “Training a helpful and harmless assistant with reinforcement learning from human feedback” arXiv, 2022 arXiv:2204.05862
- “Counterfactual Fairness” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017
- “Counterfactual Fairness in Text Classification through Robustness” In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society Honolulu HI USA: ACM, 2019, pp. 219–226 DOI: 10.1145/3306618.3317950
- Vinodkumar Prabhakaran, Ben Hutchinson and Margaret Mitchell “Perturbation Sensitivity Analysis to Detect Unintended Model Biases” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Hong Kong, China: Association for Computational Linguistics, 2019, pp. 5740–5745 DOI: 10.18653/v1/D19-1578
- “Counterfactual Reasoning for Fair Clinical Risk Prediction” In Proceedings of the 4th Machine Learning for Healthcare Conference PMLR, 2019, pp. 325–358
- “Causal Multi-level Fairness” In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21 New York, NY, USA: Association for Computing Machinery, 2021, pp. 784–794 DOI: 10.1145/3461702.3462587
- “Towards Understanding Sycophancy in Language Models” In The Twelfth International Conference on Learning Representations, 2023
- “Overview of the Medical Question Answering Task at TREC 2017 LiveQA” In TREC 2017, 2017
- “Bridging the Gap Between Consumers’ Medication Questions and Trusted Answers.” In MedInfo, 2019, pp. 25–29
- “statsmodels: Econometric and statistical modeling with Python” In 9th Python in Science Conference, 2010
- “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python” In Nature Methods 17, 2020, pp. 261–272 DOI: 10.1038/s41592-019-0686-2
- Santiago Castro “Fast Krippendorff: Fast computation of Krippendorff’s alpha agreement measure” In GitHub repository GitHub, https://github.com/pln-fing-udelar/fast-krippendorff, 2017
- Justus J Randolph “Free-Marginal Multirater Kappa (Multirater K [Free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa.” In Online submission ERIC, 2005
- Klaus Krippendorff “Estimating the Reliability, Systematic Error and Random Error of Interval Data” In Educational and Psychological Measurement 30.1 SAGE Publications Inc, 1970, pp. 61–70 DOI: 10.1177/001316447003000105
- Ka Wong, Praveen Paritosh and Lora Aroyo “Cross-Replication Reliability - An Empirical Approach to Interpreting Inter-rater Reliability” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 7053–7065 DOI: 10.18653/v1/2021.acl-long.548
- Bradley Efron “Better Bootstrap Confidence Intervals” In Journal of the American Statistical Association 82.397 Taylor & Francis, 1987, pp. 171–185 DOI: 10.1080/01621459.1987.10478410
- “Bootstrapping Clustered Data” In Journal of the Royal Statistical Society Series B: Statistical Methodology 69.3, 2007, pp. 369–390 DOI: 10.1111/j.1467-9868.2007.00593.x
- “New Creatinine- and Cystatin C–Based Equations to Estimate GFR without Race” In New England Journal of Medicine 385.19 Massachusetts Medical Society, 2021, pp. 1737–1749 DOI: 10.1056/NEJMoa2102953
- “High Agreement but Low Kappa: I. The Problems of Two Paradoxes” In Journal of Clinical Epidemiology 43.6, 1990, pp. 543–549 DOI: 10.1016/0895-4356(90)90158-l
- “High Agreement but Low Kappa: II. Resolving the Paradoxes” In Journal of Clinical Epidemiology 43.6, 1990, pp. 551–558 DOI: 10.1016/0895-4356(90)90159-m
- David Quarfoot and Richard A. Levine “How Robust Are Multirater Interrater Reliability Indices to Changes in Frequency Distribution?” In The American Statistician 70.4 Taylor & Francis, 2016, pp. 373–384 DOI: 10.1080/00031305.2016.1141708
- Joseph R. Dettori and Daniel C. Norvell “Kappa and Beyond: Is There Agreement?” In Global Spine Journal 10.4 SAGE Publications Inc, 2020, pp. 499–501 DOI: 10.1177/2192568220911648
- Matthijs J. Warrens “Inequalities between Multi-Rater Kappas” In Advances in Data Analysis and Classification 4.4, 2010, pp. 271–286 DOI: 10.1007/s11634-010-0073-4
- “All That Agrees Is Not Gold: Evaluating Ground Truth Labels and Dialogue Content for Safety”, 2023
- “Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 1004–1015
- Nevan Wichers, Carson Denison and Ahmad Beirami “Gradient-based language model red teaming” In arXiv preprint arXiv:2401.16656, 2024
- Po-Hsuan Cameron Chen, Craig H. Mermel and Yun Liu “Evaluation of Artificial Intelligence on a Reference Standard Based on Subjective Interpretation” In The Lancet Digital Health 3.11 Elsevier, 2021, pp. e693–e695 DOI: 10.1016/S2589-7500(21)00216-8
- “Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation” In AI Magazine 36.1, 2015, pp. 15–24 DOI: 10.1609/aimag.v36i1.2564
- “The Three Sides of CrowdTruth” In Human Computation 1.1, 2014 DOI: 10.15346/hc.v1i1.3
- Rebecca J. Passonneau and Bob Carpenter “The Benefits of a Model of Annotation” In Transactions of the Association for Computational Linguistics 2 Cambridge, MA: MIT Press, 2014, pp. 311–326 DOI: 10.1162/tacl_a_00185
- “Comparing Bayesian Models of Annotation” In Transactions of the Association for Computational Linguistics 6 Cambridge, MA: MIT Press, 2018, pp. 571–585 DOI: 10.1162/tacl_a_00040
- “Using Generative AI to Investigate Medical Imagery Models and Datasets” arXiv, 2023 DOI: 10.48550/arXiv.2306.00985
- Timothy P Johnson “Handbook of Health Survey Methods” Wiley Online Library, 2015
- “Comparative Survey Methodology” In Survey Methods in Multinational, Multiregional, and Multicultural Contexts John Wiley & Sons, Ltd, 2010, pp. 1–16 DOI: 10.1002/9780470609927.ch1
- “Documenting Data Production Processes: A Participatory Approach for Data Work” In Proceedings of the ACM on Human-Computer Interaction 6 Association for Computing Machinery, 2022
- “Power to the People? Opportunities and Challenges for Participatory AI” In Equity and Access in Algorithms, Mechanisms, and Optimization Arlington VA USA: ACM, 2022, pp. 1–8 DOI: 10.1145/3551624.3555290
- “The Case for Globalizing Fairness: A Mixed Methods Study on Colonialism, AI, and Health in Africa” arXiv, 2024 DOI: 10.48550/arXiv.2403.03357
- “Re-Imagining Algorithmic Fairness in India and Beyond” arXiv, 2021 DOI: 10.48550/arXiv.2101.09995
- Karina Czyzewski “Colonialism as a Broader Social Determinant of Health” In The International Indigenous Policy Journal 2.1, 2011 DOI: 10.18584/iipj.2011.2.1.5
- José G.Pérez Ramos, Adriana Garriga-López and Carlos E. Rodríguez-Díaz “How Is Colonialism a Sociostructural Determinant of Health in Puerto Rico?” In AMA Journal of Ethics 24.4 American Medical Association, 2022, pp. 305–312 DOI: 10.1001/amajethics.2022.305
- Abeba Birhane “Algorithmic Colonization of Africa” In SCRIPTed 17.2 Script Centre, University of Edinburgh, 2020, pp. 389–409 DOI: 10.2966/scrip.170220.389
- Shakir Mohamed, Marie-Therese Png and William Isaac “Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence” In Philosophy & Technology 33.4, 2020, pp. 659–684 DOI: 10.1007/s13347-020-00405-8
- “Model Cards for Model Reporting” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019, pp. 220–229 DOI: 10.1145/3287560.3287596
- “Datasheets for Datasets” In Communications of the ACM 64.12 ACM New York, NY, USA, 2021, pp. 86–92
- “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20 New York, NY, USA: Association for Computing Machinery, 2020, pp. 33–44 DOI: 10.1145/3351095.3372873
- “Direct Preference Optimization: Your Language Model Is Secretly a Reward Model” In Advances in Neural Information Processing Systems 36, 2024
- “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks” In Advances in Neural Information Processing Systems 33, 2020, pp. 9459–9474
- “"The Human Body Is a Black Box": Supporting Clinical Decision-Making with Deep Learning” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20 New York, NY, USA: Association for Computing Machinery, 2020, pp. 99–109 DOI: 10.1145/3351095.3372827
- “What’s Fair Is… Fair? Presenting JustEFAB, an Ethical Framework for Operationalizing Medical Ethics and Social Justice in the Integration of Clinical Machine Learning: JustEFAB” In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23 New York, NY, USA: Association for Computing Machinery, 2023, pp. 1505–1519 DOI: 10.1145/3593013.3594096
- “Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study” In eClinicalMedicine Elsevier, 2024
- “Healthsheet: Development of a Transparency Artifact for Health Datasets” In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22 New York, NY, USA: Association for Computing Machinery, 2022, pp. 1943–1961 DOI: 10.1145/3531146.3533239
- The STANDING Together Collaboration “Recommendations for Diversity, Inclusivity, and Generalisability in Artificial Intelligence Health Technologies and Health Datasets” [object Object], 2023 DOI: 10.5281/ZENODO.10048356
- Christina Harrington, Sheena Erete and Anne Marie Piper “Deconstructing Community-Based Collaborative Design: Towards More Equitable Participatory Design Engagements” In Proceedings of the ACM on Human-Computer Interaction 3.CSCW, 2019, pp. 216:1–216:25 DOI: 10.1145/3359318
- Nancy Krieger “202Ecosocial Theory of Disease Distribution: Embodying Societal & Ecologic Context” In Epidemiology and the People’s Health: Theory and Context Oxford University Press, 2011 DOI: 10.1093/acprof:oso/9780195383874.003.0007
- Urie Bronfenbrenner “The ecology of human development: Experiments by nature and design” Harvard university press, 1979
- Christina N Harrington “The Forgotten Margins: What Is Community-Based Participatory Health Design Telling Us?” In Interactions 27.3 ACM New York, NY, USA, 2020, pp. 24–29
- “Integrating Community-Based Participatory Research and Informatics Approaches to Improve the Engagement and Health of Underserved Populations” In Journal of the American Medical Informatics Association 23.1, 2016, pp. 60–73 DOI: 10.1093/jamia/ocv094
- Robin N. Brewer, Christina Harrington and Courtney Heldreth “Envisioning Equitable Speech Technologies for Black Older Adults” In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23 New York, NY, USA: Association for Computing Machinery, 2023, pp. 379–388 DOI: 10.1145/3593013.3594005
- Stephen R. Pfohl (10 papers)
- Heather Cole-Lewis (6 papers)
- Rory Sayres (10 papers)
- Darlene Neal (3 papers)
- Mercy Asiedu (5 papers)
- Awa Dieng (8 papers)
- Nenad Tomasev (30 papers)
- Qazi Mamunur Rashid (3 papers)
- Shekoofeh Azizi (23 papers)
- Negar Rostamzadeh (38 papers)
- Liam G. McCoy (3 papers)
- Leo Anthony Celi (49 papers)
- Yun Liu (213 papers)
- Mike Schaekermann (20 papers)
- Alanna Walton (4 papers)
- Alicia Parrish (31 papers)
- Chirag Nagpal (25 papers)
- Preeti Singh (6 papers)
- Akeiylah Dewitt (1 paper)
- Philip Mansfield (24 papers)