Calibrating Bayesian Learning via Regularization, Confidence Minimization, and Selective Inference (2404.11350v1)
Abstract: The application of AI models in fields such as engineering is limited by the known difficulty of quantifying the reliability of an AI's decision. A well-calibrated AI model must correctly report its accuracy on in-distribution (ID) inputs, while also enabling the detection of out-of-distribution (OOD) inputs. A conventional approach to improve calibration is the application of Bayesian ensembling. However, owing to computational limitations and model misspecification, practical ensembling strategies do not necessarily enhance calibration. This paper proposes an extension of variational inference (VI)-based Bayesian learning that integrates calibration regularization for improved ID performance, confidence minimization for OOD detection, and selective calibration to ensure a synergistic use of calibration regularization and confidence minimization. The scheme is constructed successively by first introducing calibration-regularized Bayesian learning (CBNN), then incorporating out-of-distribution confidence minimization (OCM) to yield CBNN-OCM, and finally integrating also selective calibration to produce selective CBNN-OCM (SCBNN-OCM). Selective calibration rejects inputs for which the calibration performance is expected to be insufficient. Numerical results illustrate the trade-offs between ID accuracy, ID calibration, and OOD calibration attained by both frequentist and Bayesian learning methods. Among the main conclusions, SCBNN-OCM is seen to achieve best ID and OOD performance as compared to existing state-of-the-art approaches at the cost of rejecting a sufficiently large number of inputs.
- S. Herbold, A. Hautli-Janisz, U. Heuer, Z. Kikteva, and A. Trautsch, “A large-scale comparison of human-written versus ChatGPT-generated essays,” Scientific Reports, vol. 13, no. 1, p. 18617, 2023.
- Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. Dillon, B. Lakshminarayanan, and J. Snoek, “Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift,” Advances in neural information processing systems, vol. 32, 2019.
- A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” nature, vol. 542, no. 7639, pp. 115–118, 2017.
- G. Detommaso, M. Bertran, R. Fogliato, and A. Roth, “Multicalibration for confidence scoring in LLMs,” arXiv preprint arXiv:2404.04689, 2024.
- B. Kumar, C. Lu, G. Gupta, A. Palepu, D. Bellamy, R. Raskar, and A. Beam, “Conformal prediction with large language models for multi-choice question answering,” arXiv preprint arXiv:2305.18404, 2023.
- A. Shih, D. Sadigh, and S. Ermon, “Long horizon temperature scaling,” in International Conference on Machine Learning. PMLR, 2023, pp. 31 422–31 434.
- V. Quach, A. Fisch, T. Schuster, A. Yala, J. H. Sohn, T. S. Jaakkola, and R. Barzilay, “Conformal language modeling,” arXiv preprint arXiv:2306.10193, 2023.
- B. Rajendran, O. Simeone, and B. M. Al-Hashimi, “Towards efficient and trustworthy AI through hardware-algorithm-communication co-design,” arXiv preprint arXiv:2309.15942, 2023.
- M. Zecchin, S. Park, and O. Simeone, “Forking uncertainties: Reliable prediction and model predictive control with sequence models via conformal risk control,” IEEE Journal on Selected Areas in Information Theory, 2024.
- L. Lindemann, M. Cleaveland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction,” IEEE Robotics and Automation Letters, 2023.
- A. Z. Ren, A. Dixit, A. Bodrova, S. Singh, S. Tu, N. Brown, P. Xu, L. Takayama, F. Xia, J. Varley et al., “Robots that ask for help: Uncertainty alignment for large language model planners,” arXiv preprint arXiv:2307.01928, 2023.
- B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” Advances in neural information processing systems, vol. 30, 2017.
- R. Krishnan and O. Tickoo, “Improving model calibration with accuracy versus uncertainty optimization,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 237–18 248, 2020.
- A. Masegosa, “Learning under model misspecification: Applications to variational and ensemble methods,” Advances in Neural Information Processing Systems, vol. 33, pp. 5479–5491, 2020.
- W. R. Morningstar, A. Alemi, and J. V. Dillon, “PACmsuperscriptPAC𝑚\text{PAC}^{m}PAC start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT-Bayes: Narrowing the empirical risk gap in the misspecified Bayesian regime,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 8270–8298.
- M. Zecchin, S. Park, O. Simeone, M. Kountouris, and D. Gesbert, “Robust PACmsuperscriptPAC𝑚\text{PAC}^{m}PAC start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT: Training ensemble models under misspecification and outliers,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
- J. Knoblauch, J. Jewson, and T. Damoulas, “Generalized variational inference: Three arguments for deriving new posteriors,” arXiv preprint arXiv:1904.02063, 2019.
- F. Wenzel, K. Roth, B. S. Veeling, J. Światkowski, L. Tran, S. Mandt, J. Snoek, T. Salimans, R. Jenatton, and S. Nowozin, “How good is the Bayes posterior in deep neural networks really?” arXiv preprint arXiv:2002.02405, 2020.
- Y. Wald, A. Feder, D. Greenfeld, and U. Shalit, “On calibration and out-of-domain generalization,” Advances in neural information processing systems, vol. 34, pp. 2215–2227, 2021.
- C. Henning, F. D’Angelo, and B. F. Grewe, “Are Bayesian neural networks intrinsically good at out-of-distribution detection?” arXiv preprint arXiv:2107.12248, 2021.
- A. Kumar, S. Sarawagi, and U. Jain, “Trainable calibration measures for neural networks from kernel mean embeddings,” in International Conference on Machine Learning. PMLR, 2018, pp. 2805–2814.
- C. Choi, F. Tajwar, Y. Lee, H. Yao, A. Kumar, and C. Finn, “Conservative prediction via data-driven confidence minimization,” arXiv preprint arXiv:2306.04974, 2023.
- A. Fisch, T. Jaakkola, and R. Barzilay, “Calibrated selective classification,” arXiv preprint arXiv:2208.12084, 2022.
- C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International conference on machine learning. PMLR, 2017, pp. 1321–1330.
- L. Tao, Y. Zhu, H. Guo, M. Dong, and C. Xu, “A benchmark study on calibration,” arXiv preprint arXiv:2308.11838, 2023.
- D. Tran, J. Liu, M. W. Dusenberry, D. Phan, M. Collier, J. Ren, K. Han, Z. Wang, Z. Mariet, H. Hu et al., “Plex: Towards reliability using pretrained large model extensions,” arXiv preprint arXiv:2207.07411, 2022.
- H. S. Yoon, J. T. J. Tee, E. Yoon, S. Yoon, G. Kim, Y. Li, and C. D. Yoo, “ESD: Expected squared difference as a tuning-free trainable calibration measure,” arXiv preprint arXiv:2303.02472, 2023.
- J. Mukhoti, V. Kulharia, A. Sanyal, S. Golodetz, P. Torr, and P. Dokania, “Calibrating deep neural networks using focal loss,” Advances in Neural Information Processing Systems, vol. 33, pp. 15 288–15 299, 2020.
- A. Karandikar, N. Cain, D. Tran, B. Lakshminarayanan, J. Shlens, M. C. Mozer, and B. Roelofs, “Soft calibration objectives for neural networks,” Advances in Neural Information Processing Systems, vol. 34, pp. 29 768–29 779, 2021.
- D. Hendrycks, M. Mazeika, and T. Dietterich, “Deep anomaly detection with outlier exposure,” arXiv preprint arXiv:1812.04606, 2018.
- Y. Geifman, “Selectivenet: A deep neural network with an integrated reject option,” in International conference on machine learning. PMLR, 2019, pp. 2151–2159.
- Y. Geifman and R. El-Yaniv, “Selective classification for deep neural networks,” Advances in neural information processing systems, vol. 30, 2017.
- L. Huang, C. Zhang, and H. Zhang, “Self-adaptive training: Beyond empirical risk minimization,” Advances in neural information processing systems, vol. 33, pp. 19 365–19 376, 2020.
- A. Pugnana, L. Perini, J. Davis, and S. Ruggieri, “Deep neural network benchmarks for selective classification,” arXiv preprint arXiv:2401.12708, 2024.
- J. Huang, S. Park, and O. Simeone, “Calibration-aware Bayesian learning,” in 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2023, pp. 1–6.
- A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (Canadian institute for advanced research),” 2010.
- S. Liang, Y. Li, and R. Srikant, “Principled detection of out-of-distribution examples in neural networks,” CoRR, abs/1706.02690, vol. 1, 2017.
- O. Bohdal, Y. Yang, and T. Hospedales, “Meta-calibration: Learning of model calibration using differentiable expected calibration error,” arXiv preprint arXiv:2106.09613, 2021.
- C. Wang, “Calibration in deep learning: A survey of the state-of-the-art,” arXiv preprint arXiv:2308.01222, 2023.
- M. H. DeGroot and S. E. Fienberg, “The comparison and evaluation of forecasters,” Journal of the Royal Statistical Society: Series D (The Statistician), vol. 32, no. 1-2, pp. 12–22, 1983.
- M. P. Naeini, G. Cooper, and M. Hauskrecht, “Obtaining well calibrated probabilities using Bayesian binning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 29, no. 1, 2015.
- C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural network,” in International conference on machine learning. PMLR, 2015, pp. 1613–1622.
- Y. Gal and Z. Ghahramani, “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning. PMLR, 2016, pp. 1050–1059.
- S. Mohamed, M. Rosca, M. Figurnov, and A. Mnih, “Monte carlo gradient estimation in machine learning,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5183–5244, 2020.
- C. Geng, S.-j. Huang, and S. Chen, “Recent advances in open set recognition: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3614–3631, 2020.
- J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, J. Dillon, and B. Lakshminarayanan, “Likelihood ratios for out-of-distribution detection,” Advances in neural information processing systems, vol. 32, 2019.
- D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” arXiv preprint arXiv:1610.02136, 2016.
- Y. Polyanskiy and Y. Wu, “Information theory: From coding to learning,” Book draft, 2022.
- S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
- A. Kumar, P. S. Liang, and T. Ma, “Verified uncertainty calibration,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- T. Raviv, S. Park, O. Simeone, and N. Shlezinger, “Modular model-based Bayesian learning for uncertainty-aware and reliable deep MIMO receivers,” in 2023 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, 2023, pp. 1032–1037.
- M. Falkiewicz, N. Takeishi, I. Shekhzadeh, A. Wehenkel, A. Delaunoy, G. Louppe, and A. Kalousis, “Calibrating neural simulation-based inference with differentiable coverage probability,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- L. Li, E. Piccoli, A. Cossu, D. Bacciu, and V. Lomonaco, “Calibration of continual learning models,” arXiv preprint arXiv:2404.07817, 2024.
- F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 IEEE 8th International Conference on Data Mining. IEEE, 2008, pp. 413–422.
- B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural computation, vol. 13, no. 7, pp. 1443–1471, 2001.
- D. O. Loftsgaarden and C. P. Quesenberry, “A nonparametric estimate of a multivariate density function,” The Annals of Mathematical Statistics, vol. 36, no. 3, pp. 1049–1051, 1965.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- D. Hendrycks, M. Mazeika, and T. Dietterich, “Deep anomaly detection with outlier exposure,” Proceedings of the International Conference on Learning Representations, 2019.
- Jiayi Huang (20 papers)
- Sangwoo Park (73 papers)
- Osvaldo Simeone (326 papers)