Introducing a Framework and Datasets for Evaluating Health Equity Harms in LLMs
Overview of Proposed Framework and Datasets
The utilization of LLMs in healthcare has demonstrated considerable potential in enhancing access to medical information and improving patient care. However, alongside the opportunities, there exist significant challenges, particularly concerning the perpetuation of biases and exacerbation of health disparities. Addressing these challenges requires a systematic approach to evaluate and identify biases embedded within LLM-generated content. In this context, the paper presents a comprehensive framework alongside a collection of newly-released datasets aimed at surfacing biases related to health equity in the outputs of medical LLMs. This effort, grounded in an iterative and participatory approach, encompasses multifactorial assessment rubrics for bias evaluation and an empirical case paper with Med-PaLM 2, contributing valuable insights into the identification and mitigation of equity-related harms in LLMs.
Multifactorial Assessment Rubrics
The assessment rubrics detailed in this paper were designed to evaluate bias within LLM-generated answers to medical queries. They incorporate dimensions of bias developed in collaboration with equity experts, reflecting a nuanced approach to understanding bias beyond conventional metrics. Three types of rubrics are introduced:
- Independent Assessment: Evaluates bias in a single answer to a question, allowing raters to identify various forms of bias including inaccuracies across identity axes, lack of inclusivity, and stereotyping.
- Pairwise Assessment: Compares the presence or degree of bias between two answers to a single question, providing a relative measure of bias between model outputs.
- Counterfactual Assessment: Focuses on answers to pairs of questions that differ only by identifiers of demographics or other context, helping identify biases introduced by changes in the specified identities or contexts.
EquityMedQA Datasets
The EquityMedQA comprises seven datasets designed to facilitate the adversarial testing of health equity issues within medical LLMs. These datasets span various aspects of medical information queries, from explicitly adversarial questions to inquiries enriched for content related to known health disparities. The diversity in the collection methodology, including human curation, LLM-generated queries, and focus on global health topics, underscores the comprehensive nature of these datasets in targeting different forms of potential bias. Notably, the datasets include:
- OMAQ: Features human-curated, explicitly adversarial queries across multiple health topics.
- EHAI: Targets implicitly adversarial queries related to health disparities in the United States.
- FBRT-Manual and FBRT-LLM: Contain questions derived through failure-based red teaming of Med-PaLM 2.
- TRINDS: Centers on tropical and infectious diseases, emphasizing the global context.
- CC-Manual and CC-LLM: Include counterfactual query pairs with adjustments for identity or context, aiding in a deeper understanding of bias generation.
Empirical Results and Implications
Through an extensive empirical paper utilizing the developed rubrics and datasets, several key findings emerged:
- Bias in LLM Outputs: The paper revealed biases within Med-PaLM 2 outputs across multiple dimensions, indicating the necessity of diverse methodologies in bias evaluation.
- Role of Rater Groups: Variation in bias reporting between physician, health equity expert, and consumer rater groups highlighted the importance of including diverse perspectives in bias evaluation efforts.
- Utility of Counterfactual Analysis: The counterfactual assessment rubric elucidated biases related to changes in demographic identifiers or context, offering profound insights into subtle forms of bias.
Concluding Remarks
The proposed framework and datasets mark a significant advancement in the ongoing efforts to mitigate health equity harms within medical LLMs. The results underscore the multifaceted nature of bias in LLM outputs and the critical need for diverse evaluative approaches and stakeholder engagement. Future research directions include refining the evaluation rubrics, extending the datasets to cover wider global contexts, and developing methodologies to mitigate identified biases effectively.