Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation (1901.06852v5)

Published 21 Jan 2019 in cs.LG and stat.ML

Abstract: Label shift refers to the phenomenon where the prior class probability p(y) changes between the training and test distributions, while the conditional probability p(x|y) stays fixed. Label shift arises in settings like medical diagnosis, where a classifier trained to predict disease given symptoms must be adapted to scenarios where the baseline prevalence of the disease is different. Given estimates of p(y|x) from a predictive model, Saerens et al. proposed an efficient maximum likelihood algorithm to correct for label shift that does not require model retraining, but a limiting assumption of this algorithm is that p(y|x) is calibrated, which is not true of modern neural networks. Recently, Black Box Shift Learning (BBSL) and Regularized Learning under Label Shifts (RLLS) have emerged as state-of-the-art techniques to cope with label shift when a classifier does not output calibrated probabilities, but both methods require model retraining with importance weights and neither has been benchmarked against maximum likelihood. Here we (1) show that combining maximum likelihood with a type of calibration we call bias-corrected calibration outperforms both BBSL and RLLS across diverse datasets and distribution shifts, (2) prove that the maximum likelihood objective is concave, and (3) introduce a principled strategy for estimating source-domain priors that improves robustness to poor calibration. This work demonstrates that the maximum likelihood with appropriate calibration is a formidable and efficient baseline for label shift adaptation; notebooks reproducing experiments available at https://github.com/kundajelab/labelshiftexperiments

Citations (9)

View on Semantic Scholar

Summary

The paper presents a novel integration of bias-corrected calibration with maximum likelihood to address label shift without retraining models.
It demonstrates superior performance over methods like Black Box Shift Learning and Regularized Learning on datasets such as MNIST, CIFAR, and diabetic retinopathy detection.
The study rigorously proves global convergence via a concave, bounded likelihood objective while introducing scalable source-domain prior estimation.

Label Shift Adaptation Through Maximum Likelihood and Calibration

The paper "Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation" presents a comprehensive paper on adapting machine learning models to scenarios where class distributions shift from training to deployment, a common occurrence known as label shift. This phenomenon is particularly prevalent in applications like medical diagnostics, where a model trained on historical data needs to predict outcomes under changed class prevalence in practice.

Key Contributions

Combining Maximum Likelihood with Calibration: The authors propose the integration of a bias-corrected calibration technique with a maximum likelihood framework to address label shift without retraining the model. They argue that modern neural networks often produce poorly calibrated probabilities, which undermines traditional maximum likelihood approaches based on calibrated outputs (as previously shown by Saerens et al.).
Evaluation Against State-of-the-Art Methods: The paper benchmarks their method against recent approaches such as Black Box Shift Learning (BBSL) and Regularized Learning under Label Shifts (RLLS). The results indicate that their method consistently outperforms these alternatives across various datasets, including MNIST, CIFAR10/100, and diabetic retinopathy detection, with detailed evaluations provided.
Robustness and Global Convergence: The paper rigorously establishes the concavity and boundedness of the maximum likelihood objective, ensuring the convergence of the Expectation Maximization algorithm to a global optimum. They also introduce strategies to define source-domain priors that enhance robustness against calibration inaccuracies.

Technical Analysis

The paper emphasizes the effectiveness of using class-specific bias parameters in calibration methods. Standard Temperature Scaling (TS), despite its popularity, does not fully correct for systematic biases, whereas bias-corrected methods like Bias-Corrected Temperature Scaling (BCTS) and Vector Scaling (VS) do. The latter methods significantly improve the adaptation's performance and are shown to achieve lower negative log-likelihood scores on validation sets compared to TS, aligning better with adaptation accuracy improvements.

Moreover, the problem of systematic bias in calibration is addressed by recommending that source-domain priors be computed from model predictions rather than true labels, which mitigates detrimental effects on adaptation performance when miscalibration exists.

Experimental Results

The empirical findings are robust, with maximum likelihood augmented by bias-corrected calibration outperforming both BBSL and RLLS in terms of mean squared error in the estimated class weights. The authors highlight that while BBSL and RLLS require more computational overhead due to retraining and hyperparameter tuning, the proposed method offers a simpler, scalable solution that is particularly advantageous in large-scale data environments.

Implications and Future Work

This work underscores the critical role of calibration in enhancing model adaptation capabilities under label shift. The insights gained here have substantial implications for deploying machine learning models in real-world scenarios where class distributions are prone to change. Future research could explore more granular evaluations across additional domains and investigate the interplay between various calibration methods and different forms of dataset shifts beyond label shift.

Conclusion

This paper presents a well-founded advancement in label shift adaptation by leveraging the strengths of maximum likelihood within a properly calibrated probabilistic framework. The methodological rigor and thorough experimental validations contribute to its standing as a robust baseline technique, offering a computationally efficient and theoretically sound alternative to existing state-of-the-art methods.

PDF Markdown

Related Papers

YouTube

Show All Videos