Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 74 tok/s

Gemini 2.5 Pro 37 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 37 tok/s Pro

GPT-4o 104 tok/s Pro

Kimi K2 184 tok/s Pro

GPT OSS 120B 448 tok/s Pro

Claude Sonnet 4.5 32 tok/s Pro

2000 character limit reached

Evaluating Biases in Context-Dependent Health Questions (2403.04858v1)

Published 7 Mar 2024 in cs.CL

Abstract: Chat-based LLMs have the opportunity to empower individuals lacking high-quality healthcare access to receive personalized information across a variety of topics. However, users may ask underspecified questions that require additional context for a model to correctly answer. We study how LLM biases are exhibited through these contextual questions in the healthcare domain. To accomplish this, we curate a dataset of sexual and reproductive healthcare questions that are dependent on age, sex, and location attributes. We compare models' outputs with and without demographic context to determine group alignment among our contextual questions. Our experiments reveal biases in each of these attributes, where young adult female users are favored.

References (21)

Citations (1)

View on Semantic Scholar

Summary

The paper finds that chat-based LLMs exhibit significant biases by favoring responses for young adults and female demographics.
The paper employs a curated dataset from Planned Parenthood and Go Ask Alice, analyzing responses using cosine similarity and % Win metrics across two LLMs.
The paper emphasizes the need for refining LLM training to ensure unbiased and equitable health information delivery across diverse demographics.

Unveiling Bias in Chat-based LLMs' Responses to Contextual Health Questions

Introduction to the Study

Recent advancements have positioned chat-based LLMs at the forefront of accessible, personalized information delivery, with significant implications for sectors lacking in resources, notably healthcare. Given the reliance on LLMs for health-related inquiries, it becomes paramount to address and understand the inherent biases in LLM responses, particularly when answering contextual health questions that lack specified demographic information. In the scrutinized paper, researchers from John Hopkins University and Northeastern Illinois University dive into this issue by evaluating biases in chat-based LLMs, primarily focusing on sexual and reproductive healthcare questions necessitating additional context related to age, sex, and location.

Methodology and Data

The research commenced with the curation of a dataset comprising contextual questions related to sexual and reproductive health, heavily reliant on personal attributes for accurate answering. Sourced from prominent health advisory platforms like Planned Parenthood and Go Ask Alice, the dataset underwent meticulous filtering to emphasize questions contingent on age, location, or sex. This dataset not only reflected the diverse inquiries individuals might have but also served as a base to probe two leading LLMs, gpt-3.5-turbo and llama-2-70b-chat, for biases in their response patterns.

For a comparative analysis, the models were probed with questions tagged with additional demographic context versus standalone inquiries. The analytics revolved around measuring the variance in response alignment towards different demographic groups, utilizing metrics like average cosine similarity scores and the percentage of most aligned answers ('% Win').

Key Findings and Observations

The results were revealing across several dimensions:

Age Bias: Both LLMs exhibited a discernible alignment towards the 18-30 age demographic, emphasizing a potential oversight towards the health inquiries of older individuals.
Sex Bias: Responses skewed towards female demographics, indicating an inherent bias in addressing sexual and reproductive health questions, which may marginalize male-related health queries.
Location-Based Bias: Minor fluctuations were noted in responses related to users' locations, with a slight tendency towards assuming the user resides in Massachusetts. This might reflect broader societal biases or model training data biases.

The human evaluation component further cemented these findings, showcasing a substantial alignment between quantitative biases and human perception, especially in age and sex attributes.

Implications and Future Directions

The research contributes significantly to the discourse on AI ethics, particularly emphasizing the urgency to mitigate biases in health information dissemination. The implications straddle practical healthcare information access and broader questions about privacy, as users might need to provide unnecessary demographic details to obtain accurate advice. For future AI research and LLM development, creating models that offer comprehensive, unbiased responses across all demographics could help ensure privacy and improve the quality and reliability of AI-driven health advisories.

Limitations and Ethical Considerations

The paper meticulously outlines its limitations, including its focus on an American-centric dataset, reliance on binary sex categories, and the dynamic nature of healthcare laws affecting location-dependent questions. Additionally, the ethical considerations surrounding the collection and use of healthcare-related inquiries underscore the sensitivity and responsibility demanded in conducting such research.

Concluding Remarks

The paper offers a rigorous examination of biases inherent in LLMs when tasked with health-related contextual inquiries. By shedding light on the predispositions favoring certain demographics over others, the research invites a critical reassessment of how LLMs are trained, evaluated, and deployed, advocating for a future where AI can serve diverse global populations equitably and sensitively in health matters and beyond.