From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions (2406.18865v1)

Published 27 Jun 2024 in cs.LG and stat.ML

Abstract: Selective labels occur when label observations are subject to a decision-making process; e.g., diagnoses that depend on the administration of laboratory tests. We study a clinically-inspired selective label problem called disparate censorship, where labeling biases vary across subgroups and unlabeled individuals are imputed as "negative" (i.e., no diagnostic test = no illness). Machine learning models naively trained on such labels could amplify labeling bias. Inspired by causal models of selective labels, we propose Disparate Censorship Expectation-Maximization (DCEM), an algorithm for learning in the presence of disparate censorship. We theoretically analyze how DCEM mitigates the effects of disparate censorship on model performance. We validate DCEM on synthetic data, showing that it improves bias mitigation (area between ROC curves) without sacrificing discriminative performance (AUC) compared to baselines. We achieve similar results in a sepsis classification task using clinical data.

PDF HTML Abstract

From Biased Selective Labels to Pseudo-Labels: An Overview

The paper entitled "From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions" by Trenton Chang and Jenna Wiens addresses a critical issue in machine learning related to selective labels, particularly within the healthcare domain. The work introduces Disparate Censorship Expectation-Maximization (DCEM) as a novel approach for handling label biases introduced by decision-making processes, commonly seen in clinical settings.

The Problem Domain

Selective labeling problems arise when the availability of label observations is determined by preceding decision-making steps. A prime example given is the diagnosis in healthcare determined by laboratory test results. Specifically, the paper investigates a form of selective labeling called disparate censorship, where the labeling biases vary across subgroups, and unlabeled instances are defaulted to negative. This context is especially pertinent in cases like sepsis diagnosis, where patients who do not receive a test are assumed healthy by default, leading to potential significant biases in training data.

Motivation and Challenges

Machine learning (ML) models trained naively on such selectively labeled data risk perpetuating and amplifying the biases inherent in them. For instance, if women are less frequently tested for cardiovascular diseases, then an ML model trained on that data may reinforce this bias by predicting lower risks for women. Traditional approaches to mitigate such bias involve training on only labeled data or employing semi-supervised methods, but these often disregard the causal factors of labeling biases.

The Proposed Solution: DCEM

Inspired by causal models, DCEM seeks to mitigate bias when training models under conditions of disparate censorship. The authors provide a theoretical framework demonstrating how DCEM regularizes model estimates to counterbalance the effects of biased labeling. Fundamentally, DCEM leverages expectation-maximization (EM) to iteratively impute labels and optimize model parameters, thereby adjusting for estimated biases.

Theoretical Insights

The paper rigorously formulates the problem, defining the data generation process under disparate censorship and detailing how the EM algorithm is applied to it. The EM process in DCEM consists of:

E-step: Imputation of soft pseudo-labels for unlabeled data using the current model's predictions.
M-step: Optimization of the model parameters, incorporating a regularization term that adjusts based on the probability of the label being censored.

The authors argue that this approach effectively mitigates the bias introduced by selective labeling without sacrificing discriminative performance.

Empirical Validation

The DCEM was validated on both synthetic data and a real-world sepsis classification task. The synthetic datasets allowed controlled testing of the algorithm's robustness to various data generation processes. The empirical results showed that DCEM consistently achieved better bias mitigation (as measured by area between ROC curves) while maintaining competitive or superior AUC scores compared to several baseline models.

In the clinical sepsis use-case, DCEM demonstrated improved performance in mitigating labeling bias without undermining predictive accuracy. Key numerical results include:

Synthetic Data: DCEM achieved a median ROC gap of 0.030 versus 0.034 of the second-best baseline (SELF), with an AUC of 0.787 compared to 0.815 of the tested-only approach.
Sepsis Task: DCEM outperformed most baselines in terms of AUC (0.620 vs. 0.593 for DragonNet) while maintaining a competitive ROC gap.

Implications and Future Directions

The implications of DCEM are multifaceted:

Practical Implications: In healthcare, implementing DCEM can lead to more equitable decision-making systems that do not perpetuate existing biases against underrepresented groups.
Theoretical Implications: The advancement in addressing causally influenced labeling biases opens new avenues for refining model training protocols under biased data regimes.

The authors suggest exploring further enhancements to DCEM, such as improving its robustness to low overlap in tested and untested populations and extending it to other domains beyond healthcare. Additionally, the paper emphasizes the importance of prospective model evaluations in practical deployments to identify and mitigate any unforeseen negative impacts of deploying ML systems trained with DCEM.

Conclusion

In summary, Chang and Wiens' work on DCEM represents a significant stride in addressing the pervasive issue of label biases arising from selective labeling processes. Through both theoretical development and empirical validation, the paper establishes DCEM as a valuable tool for building fairer and more accurate ML models in high-stakes environments like healthcare, where the consequences of biased decisions can be profound. As the field moves forward, the principles and methodologies introduced in this work will likely serve as cornerstones for future research aimed at mitigating biases in machine learning systems.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Trenton Chang (12 papers)
Jenna Wiens (37 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/chang_trenton/status/1806697568925368565