Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting Confidence Estimation: Towards Reliable Failure Prediction (2403.02886v1)

Published 5 Mar 2024 in cs.CV and cs.LG

Abstract: Reliable confidence estimation is a challenging yet fundamental requirement in many risk-sensitive applications. However, modern deep neural networks are often overconfident for their incorrect predictions, i.e., misclassified samples from known classes, and out-of-distribution (OOD) samples from unknown classes. In recent years, many confidence calibration and OOD detection methods have been developed. In this paper, we find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors. We investigate this problem and reveal that popular calibration and OOD detection methods often lead to worse confidence separation between correctly classified and misclassified examples, making it difficult to decide whether to trust a prediction or not. Finally, we propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance under various settings including balanced, long-tailed, and covariate-shift classification scenarios. Our study not only provides a strong baseline for reliable confidence estimation but also acts as a bridge between understanding calibration, OOD detection, and failure prediction. The code is available at \url{https://github.com/Impression2805/FMFP}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (128)
  1. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports, page 26094, 2016.
  2. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, 2017.
  3. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision, 12(1–3):1–308, 2020.
  4. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
  5. Classifying compliant manipulation tasks for automated planning in robotics. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1769–1776, 2015.
  6. Revisiting confidence estimation: Towards reliable failure prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  7. On calibration of modern neural networks. In International Conference on Machine Learning, pages 1321–1330, 2017.
  8. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2017.
  9. Training independent subnetworks for robust prediction. In International Conference on Learning Representations, 2020.
  10. Addressing failure prediction by learning model confidence. In Advances in Neural Information Processing Systems, pages 2898–2909, 2019.
  11. Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34:15682–15694, 2021.
  12. Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548, 2017.
  13. When does label smoothing help? In Advances in Neural Information Processing Systems, pages 4696–4705, 2019.
  14. On mixup training: Improved calibration and predictive uncertainty for deep neural networks. In Advances in Neural Information Processing Systems, pages 13888–13899, 2019.
  15. Regularizing class-wise predictions via self-knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13873–13882, 2020.
  16. Distance-based learning from errors for confidence calibration. In International Conference on Learning Representations, 2020.
  17. Calibrating deep neural networks using focal loss. In Advances in Neural Information Processing Systems, volume 33, pages 15288–15299, 2020.
  18. Combining ensembles and data augmentation can harm your calibration. In International Conference on Learning Representations, 2020.
  19. Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16489–16498, 2021.
  20. A stitch in time saves nine: A train-time regularizing loss for improved neural network calibration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16081–16090, June 2022.
  21. The devil is in the margin: Margin-based label smoothing for network calibration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 80–88, June 2022.
  22. Intra order-preserving functions for calibration of multi-class neural networks. In Advances in Neural Information Processing Systems, 2020.
  23. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In International Conference on Artificial Intelligence and Statistics, pages 623–631, 2017.
  24. Threshold temperature scaling: Heuristic to address temperature and power issues in mpsocs. Microprocess. Microsystems, page 103124, 2020.
  25. Calibration of neural networks using splines. In International Conference on Learning Representations, 2020.
  26. Multi-class uncertainty calibration via mutual information maximization-based binning. In International Conference on Learning Representations, 2020.
  27. Enhancing the reliability of out-of-distribution image detection in neural networks. In International Conference on Learning Representations, 2018.
  28. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pages 7167–7177, 2018.
  29. Energy-based out-of-distribution detection. In Advances in Neural Information Processing Systems, volume 33, pages 21464–21475, 2020.
  30. On the importance of gradients for detecting distributional shifts in the wild. In Advances in Neural Information Processing Systems, volume 34, pages 677–689, 2021.
  31. React: Out-of-distribution detection with rectified activations. In Advances in Neural Information Processing Systems, 2021.
  32. Scaling out-of-distribution detection for real-world settings. In International Conference on Machine Learning, 2022.
  33. Deep anomaly detection with outlier exposure. In International Conference on Learning Representations, 2019.
  34. Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning, pages 23631–23644, 2022.
  35. Hyperparameter-free out-of-distribution detection using cosine similarity. In Asian Conference on Computer Vision, 2020.
  36. Unified classification and rejection: A one-versus-all framework. arXiv preprint arXiv:2311.13355, 2023.
  37. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, pages 630–645, 2016.
  38. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
  39. Openmix: Exploring outlier samples for misclassification detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12074–12083, 2023.
  40. Confidence estimation via auxiliary models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  41. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems, 2019.
  42. Convolutional prototype network for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  43. Augmix: A simple data processing method to improve robustness and uncertainty. In International Conference on Learning Representations, 2020.
  44. Understanding generalization through visualizations. In ”I Can’t Believe It’s Not Better!”Advances in Neural Information Processing Systems 2020 workshop, 2020.
  45. Averaging weights leads to wider optima and better generalization. In Conference on Uncertainty in Artificial Intelligence, pages 876–885, 2018.
  46. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2020.
  47. Imitating the oracle: Towards calibrated model for class incremental learning. Neural Networks, 164:38–48, 2023.
  48. Rethinking confidence calibration for failure prediction. In Proceedings of the European Conference on Computer Vision, pages 518–536, 2022.
  49. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
  50. Vladimir Vapnik. The nature of statistical learning theory. Springer Science & Business Media, 1999.
  51. Beyond calibration: estimating the grouping loss of modern neural networks. In International Conference on Learning Representations, 2022.
  52. Obtaining well calibrated probabilities using bayesian binning. In Association for the Advancement of Artificial Intelligence, pages 2901–2907, 2015.
  53. Glenn W Brier et al. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1):1–3, 1950.
  54. Mixup training leads to reduced overfitting and improved calibration for the transformer architecture. arXiv preprint arXiv:2102.11402, 2021.
  55. When and how mixup improves calibration. In International Conference on Machine Learning, pages 26135–26160, 2022.
  56. Mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
  57. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 318–327, 2020.
  58. Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration. In Advances in Neural Information Processing Systems, pages 12295–12305, 2019.
  59. Unsupervised temperature scaling: An unsupervised post-processing calibration method of deep networks. arXiv: Computer Vision and Pattern Recognition, 2019.
  60. Rethinking calibration of deep neural networks: Do not be afraid of overconfidence. In Advances in Neural Information Processing Systems, 2021.
  61. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  62. Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems, 2021.
  63. Breaking down out-of-distribution detection: Many methods based on ood training data estimate a combination of the same core quantities. In International Conference on Machine Learning, pages 2041–2074. PMLR, 2022.
  64. Learning by seeing more classes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6):7477–7493, 2022.
  65. Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334, 2021.
  66. Is out-of-distribution detection learnable? Advances in Neural Information Processing Systems, 35:37199–37213, 2022.
  67. Average of pruning: Improving performance and stability of out-of-distribution detection. arXiv preprint arXiv:2303.01201, 2023.
  68. Towards trustworthy dataset distillation. arXiv preprint arXiv:2307.09165, 2023.
  69. Confidence-aware learning for deep neural networks. In International Conference on Machine Learning, pages 7034–7044, 2020.
  70. Selective classification for deep neural networks. Advances in neural information processing systems, 30, 2017.
  71. Ran El-Yaniv et al. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(5), 2010.
  72. Selective classification for deep neural networks. In Advances in Neural Information Processing Systems, pages 4878–4887, 2017.
  73. Optimal strategies for reject option classifiers. Journal of Machine Learning Research, 24(11):1–49, 2023.
  74. To trust or not to trust a classifier. In Advances in Neural Information Processing Systems, 2018.
  75. Learning to predict trustworthiness with steep slope loss. Advances in Neural Information Processing Systems, 2021.
  76. Towards more reliable confidence estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  77. Bayescap: Bayesian identity cap for calibrated uncertainty in frozen neural networks. In Proceedings of the European Conference on Computer Vision, volume 13672, pages 299–317, 2022.
  78. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  79. Wide residual networks. In British Machine Vision Conference, 2016.
  80. Densely connected convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2261–2269, 2017.
  81. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  82. Revisiting explicit regularization in neural networks for well-calibrated predictive uncertainty. arXiv preprint arXiv:2006.06399, 2020.
  83. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
  84. Novel decompositions of proper scoring rules for classification: Score adjustment as precursor to calibration. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I 15, pages 68–85. Springer, 2015.
  85. On focal loss for class-posterior probability estimation: A theoretical perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5202–5211, 2021.
  86. Stable reliability diagrams for probabilistic classifiers. Proceedings of the National Academy of Sciences, 118(8):e2016191118, 2021.
  87. Is label smoothing truly incompatible with knowledge distillation: An empirical study. In International Conference on Learning Representations, 2020.
  88. C Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16(1):41–46, 1970.
  89. Optimal decision rule with class-selective rejection and performance constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11):2073–2082, 2008.
  90. A survey on learning to reject. Proceedings of the IEEE, 111(2):185–215, 2023.
  91. Open-world machine learning: A review and new outlooks. arXiv preprint arXiv:2403.01759, 2024.
  92. Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5871–5880, 2021.
  93. Class-incremental learning via dual augmentation. Advances in Neural Information Processing Systems, 34:14306–14318, 2021.
  94. A statistical decision rule with incomplete knowledge about classes. Pattern Recognition, 26(1):155–165, 1993.
  95. Emergence of invariance and disentanglement in deep representations. Journal of Machine Learning Research, 19:50:1–50:34, 2018.
  96. Swad: Domain generalization by seeking flat minima. In Advances in Neural Information Processing Systems, 2021.
  97. Kevin P. Murphy. Probabilistic Machine Learning: An introduction. MIT Press, 2022.
  98. Overfitting in adversarially robust deep learning. In International Conference on Machine Learning, pages 8093–8104, 2020.
  99. Adversarial training with distribution normalization and margin balance. Pattern Recognition, 136:109182, 2023.
  100. Adversarial weight perturbation helps robust generalization. In Advances in Neural Information Processing Systems, volume 33, pages 2958–2969, 2020.
  101. Robust overfitting may be mitigated by properly learned smoothening. In International Conference on Learning Representations, 2021.
  102. Entropic gradient descent algorithms and wide flat minima. Journal of Statistical Mechanics: Theory and Experiment, (12):124015, 2021.
  103. Entropy-sgd: Biasing gradient descent into wide valleys. Journal of Statistical Mechanics: Theory and Experiment, 2019(12):124018, 2019.
  104. David A McAllester. Some pac-bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory, pages 230–234, 1998.
  105. Exploring generalization in deep learning. Advances in Neural Information Processing Systems, 30, 2017.
  106. Tiny imagenet classification with convolutional neural networks. CS 231N.
  107. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, volume 48, pages 1050–1059, 2016.
  108. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in Neural Information Processing Systems, pages 5574–5584, 2017.
  109. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30, 2017.
  110. Distilling model failures as directions in latent space. In International Conference on Learning Representations, 2023.
  111. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  112. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR, 2022.
  113. Long-tail learning via logit adjustment. In International Conference on Learning Representations, 2021.
  114. Identifying and compensating for feature deviation in imbalanced deep learning. arXiv preprint arXiv:2001.01385, 2020.
  115. Label-imbalanced and group-sensitive classification under overparameterization. In Advances in Neural Information Processing Systems, pages 18970–18983, 2021.
  116. Learning imbalanced datasets with label-distribution-aware margin loss. In Advances in Neural Information Processing Systems, volume 32, 2019.
  117. Towards prior gap and representation gap for long-tailed recognition. Pattern Recognition, 133:109012, 2023.
  118. Trustworthy long-tailed classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6970–6979, June 2022.
  119. Continual test-time domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, June 2022.
  120. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2019.
  121. Describing textures in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3606–3613, 2014.
  122. Reading digits in natural images with unsupervised feature learning. 2011.
  123. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2018.
  124. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  125. Turkergaze: Crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755, 2015.
  126. Bias-reduced uncertainty estimation for deep neural classifiers. In International Conference on Learning Representations, 2019.
  127. The elements of statistical learning. 2001.
  128. G. W. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, pages 1–3, 1950.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Fei Zhu (49 papers)
  2. Xu-Yao Zhang (44 papers)
  3. Zhen Cheng (27 papers)
  4. Cheng-Lin Liu (71 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.