Beyond In-Domain Scenarios: Robust Density-Aware Calibration (2302.05118v2)
Abstract: Calibrating deep learning models to yield uncertainty-aware predictions is crucial as deep neural networks get increasingly deployed in safety-critical applications. While existing post-hoc calibration methods achieve impressive results on in-domain test datasets, they are limited by their inability to yield reliable uncertainty estimates in domain-shift and out-of-domain (OOD) scenarios. We aim to bridge this gap by proposing DAC, an accuracy-preserving as well as Density-Aware Calibration method based on k-nearest-neighbors (KNN). In contrast to existing post-hoc methods, we utilize hidden layers of classifiers as a source for uncertainty-related information and study their importance. We show that DAC is a generic method that can readily be combined with state-of-the-art post-hoc methods. DAC boosts the robustness of calibration performance in domain-shift and OOD, while maintaining excellent in-domain predictive uncertainty estimates. We demonstrate that DAC leads to consistently better calibration across a large number of model architectures, datasets, and metrics. Additionally, we show that DAC improves calibration substantially on recent large-scale neural networks pre-trained on vast amounts of data.
- Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. arXiv preprint arXiv:2002.06470, 2020.
- Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
- Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE international conference on computer vision, pp. 1511–1520, 2017.
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020a.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020b.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059, 2016.
- Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2414–2423, 2016.
- Confidence calibration for domain generalization under covariate shift. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8958–8967, 2021.
- On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1321–1330. JMLR. org, 2017.
- Calibration of neural networks using splines. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=eQe8DEWNN2W.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016a.
- Identity mappings in deep residual networks. In European conference on computer vision, pp. 630–645. Springer, 2016b.
- Benchmarking neural network robustness to common corruptions and perturbations. International Conference on Learning Representations 2019, 2019.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR, 2017.
- Using self-supervised learning can improve model robustness and uncertainty. Advances in neural information processing systems, 32, 2019.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
- Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
- Big transfer (bit): General visual representation learning. In European conference on computer vision, pp. 491–507. Springer, 2020.
- Learning multiple layers of features from tiny images. 2009.
- Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration. Advances in neural information processing systems, 32, 2019.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS, 2018.
- Enhancing the reliability of out-of-distribution image detection in neural networks. In ICLR, 2018.
- A simple baseline for bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems, 32, 2019.
- Exploring the limits of weakly supervised pretraining. In ECCV, 2018.
- Dirichlet-based gaussian processes for large-scale calibrated classification. Advances in Neural Information Processing Systems, 31, 2018.
- Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34:15682–15694, 2021.
- When does label smoothing help? Advances in neural information processing systems, 32, 2019.
- Obtaining well calibrated probabilities using bayesian binning. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems, pp. 13991–14002, 2019.
- Platt, J. C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers, pp. 61–74. MIT Press, 1999.
- Post-hoc calibration of neural networks. arXiv preprint arXiv:2006.12807, 2020a.
- Intra order-preserving functions for calibration of multi-class neural networks. Advances in Neural Information Processing Systems, 33:13456–13467, 2020b.
- Evidential deep learning to quantify classification uncertainty. In Advances in Neural Information Processing Systems, pp. 3179–3189, 2018.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Out-of-distribution detection with deep nearest neighbors. arXiv preprint arXiv:2204.06507, 2022.
- On mixup training: Improved calibration and predictive uncertainty for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Towards trustworthy predictions from deep neural networks with fast adversarial calibration. In Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021.
- Post-hoc uncertainty calibration for domain drift scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10124–10132, 2021.
- Parameterized temperature scaling for boosting the expressive power in post-hoc uncertainty calibration. In European Conference on Computer Vision, pp. 555–569. Springer, 2022.
- On calibration and out-of-domain generalization. Advances in neural information processing systems, 34:2215–2227, 2021.
- Rethinking calibration of deep neural networks: Do not be afraid of overconfidence. Advances in Neural Information Processing Systems, 34:11809–11820, 2021.
- Transferable calibration with lower bias and variance in domain adaptation. Advances in Neural Information Processing Systems, 33:19212–19223, 2020.
- Flipout: Efficient pseudo-independent weight perturbations on mini-batches. arXiv preprint arXiv:1803.04386, 2018.
- Non-parametric calibration for classification. In International Conference on Artificial Intelligence and Statistics, pp. 178–190. PMLR, 2020.
- Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
- Robust calibration with multi-domain temperature scaling. arXiv preprint arXiv:2206.02757, 2022.
- Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Icml, volume 1, pp. 609–616. Citeseer, 2001.
- Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 694–699, 2002.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In International Conference on Machine Learning (ICML), 2020.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.