Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation (2403.06759v1)

Published 11 Mar 2024 in cs.CV and cs.LG

Abstract: Deep neural networks for medical image segmentation often produce overconfident results misaligned with empirical observations. Such miscalibration, challenges their clinical translation. We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality. We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches. Our work also introduces the concept of dataset reliability histograms which generalises standard reliability diagrams for refined visual assessment of calibration in semantic segmentation aggregated at the dataset level. Using mL1-ACE, we reduce average and maximum calibration error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS 2021 dataset. We share our code here: https://github.com/cai4cai/ACE-DLIRIS

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. On calibration of modern neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1321–1330. PMLR, August 2017.
  2. New target volume delineation and PTV strategies to further personalise radiotherapy. Physics in Medicine & Biology, 66(5):055024, February 2021.
  3. Loss odyssey in medical image segmentation. Medical Image Analysis, 71:102035, 2021.
  4. Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. IEEE Transactions on Medical Imaging, 39(12):3868–3878, 2020.
  5. The comparison and evaluation of forecasters. Journal of the Royal Statistical Society. Series D (The Statistician), 32(1/2):12–22, 1983.
  6. Confidence histograms for model reliability analysis and temperature calibration. In Ender Konukoglu, Bjoern Menze, Archana Venkataraman, Christian Baumgartner, Qi Dou, and Shadi Albarqouni, editors, Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, volume 172 of Proceedings of Machine Learning Research, pages 741–759. PMLR, July 2022.
  7. John Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, 1999.
  8. Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  9. Local temperature scaling for probability calibration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6889–6899, October 2021.
  10. Class-distribution-aware calibration for long-tailed visual recognition. In International Conference on Machine Learning, 2021.
  11. Improved trainable calibration method for neural networks on medical imaging classification. In British Machine Vision Conference (BMVC), 2020.
  12. A stitch in time saves nine: A train-time regularizing loss for improved neural network calibration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  13. Meta-calibration: Learning of model calibration using differentiable expected calibration error. Transactions on Machine Learning Research, 2023.
  14. Calibrating the Dice loss to handle neural network overconfidence for biomedical image segmentation. Journal of Digital Imaging, 2021.
  15. Trust your neighbours: Penalty-based constraints for model calibration. In Hayit Greenspan, Anant Madabhushi, Parvin Mousavi, Septimiu Salcudean, James Duncan, Tanveer Syeda-Mahmood, and Russell Taylor, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, pages 572–581, Cham, 2023. Springer Nature Switzerland.
  16. Neighbor-aware calibration of segmentation networks with penalty-based constraints. arXiv preprint arXiv:2401.14487, 2024.
  17. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv, July 2021.
  18. Obtaining well calibrated probabilities using bayesian binning. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 2901–2907. AAAI Press, 2015.
  19. Relaxed softmax: efficient confidence auto-calibration for safe pedestrian detection. In 2018 NIPS MLITS Workshop: Machine Learning for Intelligent Transportation System. OpenReview, 2018.
  20. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  21. MONAI: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701, 2022.
  22. Generalized Wasserstein Dice loss, test-time augmentation, and transformers for the BraTS 2021 challenge. In Alessandro Crimi and Spyridon Bakas, editors, Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pages 187–196, Cham, 2022. Springer International Publishing.
  23. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods, 18(2):203–211, February 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.