Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration (2111.15430v4)

Published 30 Nov 2021 in cs.CV and cs.LG

Abstract: In spite of the dominant performances of deep neural networks, recent works have shown that they are poorly calibrated, resulting in over-confident predictions. Miscalibration can be exacerbated by overfitting due to the minimization of the cross-entropy during training, as it promotes the predicted softmax probabilities to match the one-hot label assignments. This yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations. Recent evidence from the literature suggests that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of image classification, semantic segmentation and NLP benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, without affecting the discriminative performance. The code is available at https://github.com/by-liu/MbLS .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. D.P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.
  2. Weight uncertainty in neural network. In ICML, 2015.
  3. Rethinking atrous convolution for semantic image segmentation. In CVPR, 2017.
  4. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  5. Local temperature scaling for probability calibration. In ICCV, 2021.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  7. The pascal visual object classes challenge: A retrospective. IJCV, 111(1):98–136, 2015.
  8. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
  9. On calibration of modern neural networks. In ICML, 2017.
  10. Deep residual learning for image recognition. In CVPR, 2016.
  11. Probabilistic backpropagation for scalable learning of bayesian neural networks. In ICML, 2015.
  12. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  13. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
  14. Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS, 2017.
  15. Ken Lang. Newsweeder: Learning to filter netnews. In ICML, 1995.
  16. Orthogonal ensemble networks for biomedical image segmentation. In MICCAI, 2021.
  17. Network in network. In ICML, 2014.
  18. Focal loss for dense object detection. In CVPR, 2017.
  19. Structured and efficient variational deep learning with matrix gaussian posteriors. In ICML, 2016.
  20. Does label smoothing mitigate label noise? In ICML, 2020.
  21. Meta-cal: Well-controlled post-hoc calibration by ranking. In ICML, 2021.
  22. Calibrating deep neural networks using focal loss. In NeurIPS, 2020.
  23. When does label smoothing help? In NeurIPS, 2019.
  24. Obtaining well calibrated probabilities using bayesian binning. In AAAI, 2015.
  25. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In NeurIPS, 2019.
  26. Glove: Global vectors for word representation. In EMNLP, 2014.
  27. Regularizing neural networks by penalizing confident output distributions. In ICLR, 2017.
  28. John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.
  29. Rethinking the inception architecture for computer vision. In CVPR, 2016.
  30. Post-hoc uncertainty calibration for domain drift scenarios. In CVPR, 2021.
  31. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  32. Rethinking calibration of deep neural networks: Do not be afraid of overconfidence. In NeurIPS, 2021.
  33. Hyperparameter ensembles for robustness and uncertainty quantification. In NeurIPS, 2020.
  34. Disturblabel: Regularizing cnn on the loss layer. In CVPR, 2016.
  35. Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In ICML, 2020.
  36. Confidence calibration for convolutional neural networks using structured dropout. arXiv preprint arXiv:1906.09551, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Bingyuan Liu (28 papers)
  2. Ismail Ben Ayed (133 papers)
  3. Adrian Galdran (36 papers)
  4. Jose Dolz (97 papers)
Citations (56)

Summary

Margin-based Label Smoothing for Network Calibration

The paper "The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration" addresses the critical issue of calibration in deep neural networks (DNNs), which has implications for domains reliant on deep learning such as image classification and semantic segmentation. Despite the advancements in DNNs that have led to unprecedented levels of accuracy in multiple tasks, one concern remains prevalent: the issue of poorly calibrated models. Models that are overly confident in their predictions pose significant challenges, especially in applications where reliable uncertainty estimates are crucial.

Summary of Contributions

The authors provide a novel perspective on calibration losses used in deep learning. They posit that many state-of-the-art methods, such as Label Smoothing (LS), Focal Loss (FL), and Explicit Confidence Penalty (ECP), can be viewed through the lens of constrained optimization, specifically as approximations imposing equality constraints on the distances between logits. These constraints often push predictive distributions towards a non-informative solution, potentially hampering a model's discriminative performance.

To counteract this limitation, the paper proposes a flexible generalization using inequality constraints, which introduces a controllable margin on logit distances. This method, termed Margin-based Label Smoothing (MbLS), aims to strike a balance between maintaining the discriminative ability of the model while ensuring better calibrated outputs.

Experimental Insights

The experimental evaluation spans across various benchmarks such as CIFAR-10, Tiny-ImageNet, CUB-200-2011, PASCAL VOC 2012, and the 20. Newsgroups dataset, involving diverse network architectures. The superior performance of the MbLS compared to existing calibration techniques is supported by strong numerical evidence. For instance, on both CIFAR-10 with ResNet-50, the proposed method achieves an Expected Calibration Error (ECE) of 1.16, significantly outperforming traditional LS and FL approaches. This demonstrates not only improvement in calibration but also competitive accuracy metrics compared to state-of-the-art techniques.

Theoretical Implications and Future Directions

The proposed integration of inequality constraints offers a new design paradigm for loss functions in deep learning. The insights drawn from viewing calibration loss functions as constrained optimization problems could inspire novel strategies for addressing miscalibration, potentially influencing both theoretical explorations into the nature of loss functions and practical methods in training regimes.

Future research directions could explore more complex margin-based constraints or adaptive schemes that dynamically adjust the margin m based on data properties or during different phases of training. Moreover, as this paper relies on cross-entropy-based frameworks, investigating its implications in non-traditional, non-independent, and identically distributed (non-i.i.d.) datasets could yield further improvements in reliability assessments and community trust in deep models. The discussion also hints at exploring alternative ensemble methods or Bayesian inference approaches, given their capacity to handle predictive uncertainty effectively.

In summary, the work put forth in this paper provides an analytical reassessment of how modern calibration losses can be improved through margin-based techniques. It stands as a significant contribution to the field, offering insights that blend theoretical underpinnings with practical enhancements, evidenced by robust empirical results.

Youtube Logo Streamline Icon: https://streamlinehq.com