Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Unbiased Calibration using Meta-Regularization (2303.15057v3)

Published 27 Mar 2023 in cs.LG

Abstract: Model miscalibration has been frequently identified in modern deep neural networks. Recent work aims to improve model calibration directly through a differentiable calibration proxy. However, the calibration produced is often biased due to the binning mechanism. In this work, we propose to learn better-calibrated models via meta-regularization, which has two components: (1) gamma network (gamma-net), a meta learner that outputs sample-wise gamma values (continuous variable) for Focal loss for regularizing the backbone network; (2) smooth expected calibration error (SECE), a Gaussian-kernel based, unbiased, and differentiable surrogate to ECE that enables the smooth optimization of gamma-Net. We evaluate the effectiveness of the proposed approach in regularizing neural networks towards improved and unbiased calibration on three computer vision datasets. We empirically demonstrate that: (a) learning sample-wise gamma as continuous variables can effectively improve calibration; (b) SECE smoothly optimizes gamma-net towards unbiased and robust calibration with respect to the binning schemes; and (c) the combination of gamma-net and SECE achieves the best calibration performance across various calibration metrics while retaining very competitive predictive performance as compared to multiple recently proposed methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Meta-calibration: Meta-learning of model calibration using differentiable expected calibration error. arXiv preprint arXiv:2106.09613, 2021.
  2. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
  3. Glenn W Brier et al. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1):1–3, 1950.
  4. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 1721–1730, 2015.
  5. On the lambertw function. Advances in Computational mathematics, 5(1):329–359, 1996.
  6. The comparison and evaluation of forecasters. The Statistician, 1983.
  7. Bohb: Robust and efficient hyperparameter optimization at scale. 35th International Conference on Machine Learning, ICML 2018, 4:2323–2341, 7 2018. 10.48550/arxiv.1807.01774. URL https://arxiv.org/abs/1807.01774v1.
  8. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML’16, pages 1050–1059. PMLR, 2016.
  9. A survey of uncertainty in deep neural networks. arXiv preprint arXiv:2107.03342, 2021.
  10. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. Ieee, 2013.
  11. The principle of maximum entropy. The mathematical intelligencer, 7(1):42–48, 1985.
  12. On calibration of modern neural networks. In International Conference on Machine Learning, pages 1321–1330. PMLR, 2017.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  14. Densely connected convolutional networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January:2261–2269, 8 2016. 10.48550/arxiv.1608.06993. URL https://arxiv.org/abs/1608.06993v5.
  15. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  16. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  17. Trainable calibration measures for neural networks from kernel mean embeddings. In International Conference on Machine Learning, pages 2805–2814. PMLR, 2018.
  18. Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS’17, 2017.
  19. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18:1–52, 3 2016. ISSN 15337928. 10.48550/arxiv.1603.06560. URL https://arxiv.org/abs/1603.06560v4.
  20. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  21. Scalable gradient-based tuning of continuous regularization hyperparameters. 33rd International Conference on Machine Learning, ICML 2016, 6:4333–4341, 11 2015. 10.48550/arxiv.1511.06727. URL https://arxiv.org/abs/1511.06727v3.
  22. A simple baseline for bayesian uncertainty in deep learning. NeurIPS, 32, 2019.
  23. Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34, 2021.
  24. Attended temperature scaling: A practical approach for calibrating deep neural networks, 2018. URL https://arxiv.org/abs/1810.11586.
  25. Calibrating deep neural networks using focal loss. Advances in Neural Information Processing Systems, 33:15288–15299, 2020.
  26. When does label smoothing help? Advances in neural information processing systems, 32, 2019.
  27. Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015.
  28. Predicting Good Probabilities with Supervised Learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005a.
  29. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning, pages 625–632, 2005b.
  30. Measuring calibration in deep learning, 2019. URL https://arxiv.org/abs/1904.01685.
  31. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  32. Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548, 2017.
  33. John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.
  34. A general approximation framework for direct optimization of information retrieval measures. Inf. Retr., 13(4):375–397, aug 2010. ISSN 1386-4564. 10.1007/s10791-009-9124-x. URL https://doi.org/10.1007/s10791-009-9124-x.
  35. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25, pages 1–9, 2012. ISSN 10495258. 2012arXiv1206.2944S. URL https://arxiv.org/pdf/1206.2944.pdf.
  36. On mixup training: Improved calibration and predictive uncertainty for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
  37. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  38. Xuan Yang Ya Le. Tiny imagenet visual recognition challenge, 2015. URL http://vision.stanford.edu/teaching/cs231n/reports/2015/pdfs/yle_project.pdf.
  39. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proceedings of the International Conference on Machine Learning (ICML), pages 609–616. Citeseer, 2001.
  40. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 694–699. ACM, 2002.
  41. Scaling vision transformers. 6 2021. 10.48550/arxiv.2106.04560. URL https://arxiv.org/abs/2106.04560v1.
  42. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
  43. Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning, 2020. URL https://arxiv.org/abs/2003.07329.
Citations (1)

Summary

We haven't generated a summary for this paper yet.