Towards Unbiased Calibration using Meta-Regularization (2303.15057v3)
Abstract: Model miscalibration has been frequently identified in modern deep neural networks. Recent work aims to improve model calibration directly through a differentiable calibration proxy. However, the calibration produced is often biased due to the binning mechanism. In this work, we propose to learn better-calibrated models via meta-regularization, which has two components: (1) gamma network (gamma-net), a meta learner that outputs sample-wise gamma values (continuous variable) for Focal loss for regularizing the backbone network; (2) smooth expected calibration error (SECE), a Gaussian-kernel based, unbiased, and differentiable surrogate to ECE that enables the smooth optimization of gamma-Net. We evaluate the effectiveness of the proposed approach in regularizing neural networks towards improved and unbiased calibration on three computer vision datasets. We empirically demonstrate that: (a) learning sample-wise gamma as continuous variables can effectively improve calibration; (b) SECE smoothly optimizes gamma-net towards unbiased and robust calibration with respect to the binning schemes; and (c) the combination of gamma-net and SECE achieves the best calibration performance across various calibration metrics while retaining very competitive predictive performance as compared to multiple recently proposed methods.
- Meta-calibration: Meta-learning of model calibration using differentiable expected calibration error. arXiv preprint arXiv:2106.09613, 2021.
- End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
- Glenn W Brier et al. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1):1–3, 1950.
- Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 1721–1730, 2015.
- On the lambertw function. Advances in Computational mathematics, 5(1):329–359, 1996.
- The comparison and evaluation of forecasters. The Statistician, 1983.
- Bohb: Robust and efficient hyperparameter optimization at scale. 35th International Conference on Machine Learning, ICML 2018, 4:2323–2341, 7 2018. 10.48550/arxiv.1807.01774. URL https://arxiv.org/abs/1807.01774v1.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML’16, pages 1050–1059. PMLR, 2016.
- A survey of uncertainty in deep neural networks. arXiv preprint arXiv:2107.03342, 2021.
- Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. Ieee, 2013.
- The principle of maximum entropy. The mathematical intelligencer, 7(1):42–48, 1985.
- On calibration of modern neural networks. In International Conference on Machine Learning, pages 1321–1330. PMLR, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Densely connected convolutional networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January:2261–2269, 8 2016. 10.48550/arxiv.1608.06993. URL https://arxiv.org/abs/1608.06993v5.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Trainable calibration measures for neural networks from kernel mean embeddings. In International Conference on Machine Learning, pages 2805–2814. PMLR, 2018.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS’17, 2017.
- Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18:1–52, 3 2016. ISSN 15337928. 10.48550/arxiv.1603.06560. URL https://arxiv.org/abs/1603.06560v4.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
- Scalable gradient-based tuning of continuous regularization hyperparameters. 33rd International Conference on Machine Learning, ICML 2016, 6:4333–4341, 11 2015. 10.48550/arxiv.1511.06727. URL https://arxiv.org/abs/1511.06727v3.
- A simple baseline for bayesian uncertainty in deep learning. NeurIPS, 32, 2019.
- Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34, 2021.
- Attended temperature scaling: A practical approach for calibrating deep neural networks, 2018. URL https://arxiv.org/abs/1810.11586.
- Calibrating deep neural networks using focal loss. Advances in Neural Information Processing Systems, 33:15288–15299, 2020.
- When does label smoothing help? Advances in neural information processing systems, 32, 2019.
- Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015.
- Predicting Good Probabilities with Supervised Learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005a.
- Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning, pages 625–632, 2005b.
- Measuring calibration in deep learning, 2019. URL https://arxiv.org/abs/1904.01685.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548, 2017.
- John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.
- A general approximation framework for direct optimization of information retrieval measures. Inf. Retr., 13(4):375–397, aug 2010. ISSN 1386-4564. 10.1007/s10791-009-9124-x. URL https://doi.org/10.1007/s10791-009-9124-x.
- Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25, pages 1–9, 2012. ISSN 10495258. 2012arXiv1206.2944S. URL https://arxiv.org/pdf/1206.2944.pdf.
- On mixup training: Improved calibration and predictive uncertainty for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Xuan Yang Ya Le. Tiny imagenet visual recognition challenge, 2015. URL http://vision.stanford.edu/teaching/cs231n/reports/2015/pdfs/yle_project.pdf.
- Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proceedings of the International Conference on Machine Learning (ICML), pages 609–616. Citeseer, 2001.
- Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 694–699. ACM, 2002.
- Scaling vision transformers. 6 2021. 10.48550/arxiv.2106.04560. URL https://arxiv.org/abs/2106.04560v1.
- mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
- Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning, 2020. URL https://arxiv.org/abs/2003.07329.