- The paper presents a novel integration of bias-corrected calibration with maximum likelihood to address label shift without retraining models.
- It demonstrates superior performance over methods like Black Box Shift Learning and Regularized Learning on datasets such as MNIST, CIFAR, and diabetic retinopathy detection.
- The study rigorously proves global convergence via a concave, bounded likelihood objective while introducing scalable source-domain prior estimation.
Label Shift Adaptation Through Maximum Likelihood and Calibration
The paper "Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation" presents a comprehensive paper on adapting machine learning models to scenarios where class distributions shift from training to deployment, a common occurrence known as label shift. This phenomenon is particularly prevalent in applications like medical diagnostics, where a model trained on historical data needs to predict outcomes under changed class prevalence in practice.
Key Contributions
- Combining Maximum Likelihood with Calibration: The authors propose the integration of a bias-corrected calibration technique with a maximum likelihood framework to address label shift without retraining the model. They argue that modern neural networks often produce poorly calibrated probabilities, which undermines traditional maximum likelihood approaches based on calibrated outputs (as previously shown by Saerens et al.).
- Evaluation Against State-of-the-Art Methods: The paper benchmarks their method against recent approaches such as Black Box Shift Learning (BBSL) and Regularized Learning under Label Shifts (RLLS). The results indicate that their method consistently outperforms these alternatives across various datasets, including MNIST, CIFAR10/100, and diabetic retinopathy detection, with detailed evaluations provided.
- Robustness and Global Convergence: The paper rigorously establishes the concavity and boundedness of the maximum likelihood objective, ensuring the convergence of the Expectation Maximization algorithm to a global optimum. They also introduce strategies to define source-domain priors that enhance robustness against calibration inaccuracies.
Technical Analysis
The paper emphasizes the effectiveness of using class-specific bias parameters in calibration methods. Standard Temperature Scaling (TS), despite its popularity, does not fully correct for systematic biases, whereas bias-corrected methods like Bias-Corrected Temperature Scaling (BCTS) and Vector Scaling (VS) do. The latter methods significantly improve the adaptation's performance and are shown to achieve lower negative log-likelihood scores on validation sets compared to TS, aligning better with adaptation accuracy improvements.
Moreover, the problem of systematic bias in calibration is addressed by recommending that source-domain priors be computed from model predictions rather than true labels, which mitigates detrimental effects on adaptation performance when miscalibration exists.
Experimental Results
The empirical findings are robust, with maximum likelihood augmented by bias-corrected calibration outperforming both BBSL and RLLS in terms of mean squared error in the estimated class weights. The authors highlight that while BBSL and RLLS require more computational overhead due to retraining and hyperparameter tuning, the proposed method offers a simpler, scalable solution that is particularly advantageous in large-scale data environments.
Implications and Future Work
This work underscores the critical role of calibration in enhancing model adaptation capabilities under label shift. The insights gained here have substantial implications for deploying machine learning models in real-world scenarios where class distributions are prone to change. Future research could explore more granular evaluations across additional domains and investigate the interplay between various calibration methods and different forms of dataset shifts beyond label shift.
Conclusion
This paper presents a well-founded advancement in label shift adaptation by leveraging the strengths of maximum likelihood within a properly calibrated probabilistic framework. The methodological rigor and thorough experimental validations contribute to its standing as a robust baseline technique, offering a computationally efficient and theoretically sound alternative to existing state-of-the-art methods.