Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks (2311.03683v2)
Abstract: Discriminatively trained, deterministic neural networks are the de facto choice for classification problems. However, even though they achieve state-of-the-art results on in-domain test sets, they tend to be overconfident on out-of-distribution (OOD) data. For instance, ReLU networks - a popular class of neural network architectures - have been shown to almost always yield high confidence predictions when the test data are far away from the training set, even when they are trained with OOD data. We overcome this problem by adding a term to the output of the neural network that corresponds to the logit of an extra class, that we design to dominate the logits of the original classes as we move away from the training data.This technique provably prevents arbitrarily high confidence on far-away test data while maintaining a simple discriminative point-estimate training. Evaluation on various benchmarks demonstrates strong performance against competitive baselines on both far-away and realistic OOD data.
- Understanding deep Neural Networks with rectified linear units. In International Conference on Learning Representations, 2018.
- Weight uncertainty in Neural Network. In International Conference on Machine Learning. PMLR, 2015.
- Deep learning for classical japanese literature. arXiv preprint arXiv:1812.01718, 2018.
- Emnist: Extending MNIST to handwritten letters. In 2017 international joint conference on Neural Networks. IEEE, 2017.
- Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning. PMLR, 2016.
- Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2014.
- On calibration of modern Neural Networks. In International Conference on Machine Learning. PMLR, 2017.
- Calibration of Neural Networks using Splines. In International Conference on Learning Representations, 2020.
- Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- Benchmarking Neural Network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2018.
- Deep anomaly detection with Outlier Exposure. In International Conference on Learning Representations, 2018.
- Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
- Being Bayesian, even just a bit, fixes overconfidence in ReLU Networks. In International Conference on Machine Learning. PMLR, 2020.
- An infinite-feature extension for Bayesian ReLU nets that fixes their asymptotic overconfidence. Advances in Neural Information Processing Systems, 34, 2021.
- Posterior refinement improves sample efficiency in Bayesian Neural Networks. 2022a.
- Being a bit frequentist improves bayesian Neural Networks. In International Conference on Artificial Intelligence and Statistics. PMLR, 2022b.
- Krizhevsky, Alex et al. Learning multiple layers of features from tiny images. 2009.
- Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration. Advances in neural information processing systems, 32, 2019.
- Trainable calibration measures for Neural Networks from kernel mean embeddings. In International Conference on Machine Learning. PMLR, 2018.
- Simple and scalable predictive uncertainty estimation using Deep Ensembles. Advances in neural information processing systems, 30, 2017.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 1998.
- A simple unified framework for detecting out-of-distribution samples and Adversarial Attacks. Advances in Neural Information Processing Systems, 31, 2018.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2017.
- Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33, 2020.
- Multiplicative normalizing flows for variational Bayesian Neural Networks. In International Conference on Machine Learning.
- Towards Neural Networks that provably know when they don’t know. In International Conference on Learning Representations, 2020.
- Revisiting the calibration of modern Neural Networks. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S., and Vaughan, J. Wortman, editors, Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
- Deep deterministic uncertainty: A new simple baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- When does Label Smoothing help? Advances in neural information processing systems, 32, 2019.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Deep Neural Networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems, 32, 2019.
- On Mixup training: Improved calibration and predictive uncertainty for deep Neural Networks. Advances in Neural Information Processing Systems, 32, 2019.
- 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE transactions on Pattern Analysis and Machine Intelligence, 30(11), 2008.
- Out-of-distribution detection with implicit outlier transformation. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=hdghx6wbGuD.
- Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Wide Residual Networks. arXiv preprint arXiv:1605.07146, 2016.
- Universum prescription: Regularization using unlabeled data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.