The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration (2111.15430v4)
Abstract: In spite of the dominant performances of deep neural networks, recent works have shown that they are poorly calibrated, resulting in over-confident predictions. Miscalibration can be exacerbated by overfitting due to the minimization of the cross-entropy during training, as it promotes the predicted softmax probabilities to match the one-hot label assignments. This yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations. Recent evidence from the literature suggests that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of image classification, semantic segmentation and NLP benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, without affecting the discriminative performance. The code is available at https://github.com/by-liu/MbLS .
- D.P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.
- Weight uncertainty in neural network. In ICML, 2015.
- Rethinking atrous convolution for semantic image segmentation. In CVPR, 2017.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Local temperature scaling for probability calibration. In ICCV, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- The pascal visual object classes challenge: A retrospective. IJCV, 111(1):98–136, 2015.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
- On calibration of modern neural networks. In ICML, 2017.
- Deep residual learning for image recognition. In CVPR, 2016.
- Probabilistic backpropagation for scalable learning of bayesian neural networks. In ICML, 2015.
- Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS, 2017.
- Ken Lang. Newsweeder: Learning to filter netnews. In ICML, 1995.
- Orthogonal ensemble networks for biomedical image segmentation. In MICCAI, 2021.
- Network in network. In ICML, 2014.
- Focal loss for dense object detection. In CVPR, 2017.
- Structured and efficient variational deep learning with matrix gaussian posteriors. In ICML, 2016.
- Does label smoothing mitigate label noise? In ICML, 2020.
- Meta-cal: Well-controlled post-hoc calibration by ranking. In ICML, 2021.
- Calibrating deep neural networks using focal loss. In NeurIPS, 2020.
- When does label smoothing help? In NeurIPS, 2019.
- Obtaining well calibrated probabilities using bayesian binning. In AAAI, 2015.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In NeurIPS, 2019.
- Glove: Global vectors for word representation. In EMNLP, 2014.
- Regularizing neural networks by penalizing confident output distributions. In ICLR, 2017.
- John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.
- Rethinking the inception architecture for computer vision. In CVPR, 2016.
- Post-hoc uncertainty calibration for domain drift scenarios. In CVPR, 2021.
- The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
- Rethinking calibration of deep neural networks: Do not be afraid of overconfidence. In NeurIPS, 2021.
- Hyperparameter ensembles for robustness and uncertainty quantification. In NeurIPS, 2020.
- Disturblabel: Regularizing cnn on the loss layer. In CVPR, 2016.
- Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In ICML, 2020.
- Confidence calibration for convolutional neural networks using structured dropout. arXiv preprint arXiv:1906.09551, 2019.
- Bingyuan Liu (28 papers)
- Ismail Ben Ayed (133 papers)
- Adrian Galdran (36 papers)
- Jose Dolz (97 papers)