Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation (1911.13273v2)

Published 29 Nov 2019 in eess.IV and cs.CV

Abstract: Fully convolutional neural networks (FCNs), and in particular U-Nets, have achieved state-of-the-art results in semantic segmentation for numerous medical imaging applications. Moreover, batch normalization and Dice loss have been used successfully to stabilize and accelerate training. However, these networks are poorly calibrated i.e. they tend to produce overconfident predictions both in correct and erroneous classifications, making them unreliable and hard to interpret. In this paper, we study predictive uncertainty estimation in FCNs for medical image segmentation. We make the following contributions: 1) We systematically compare cross entropy loss with Dice loss in terms of segmentation quality and uncertainty estimation of FCNs; 2) We propose model ensembling for confidence calibration of the FCNs trained with batch normalization and Dice loss; 3) We assess the ability of calibrated FCNs to predict segmentation quality of structures and detect out-of-distribution test examples. We conduct extensive experiments across three medical image segmentation applications of the brain, the heart, and the prostate to evaluate our contributions. The results of this study offer considerable insight into the predictive uncertainty estimation and out-of-distribution detection in medical image segmentation and provide practical recipes for confidence calibration. Moreover, we consistently demonstrate that model ensembling improves confidence calibration.

Citations (239)

View on Semantic Scholar

Summary

The paper demonstrates that model ensembling with Dice loss enhances both segmentation quality and confidence calibration compared to single models.
It compares cross-entropy and Dice loss, revealing that cross-entropy yields better-calibrated uncertainty estimates while Dice loss improves segmentation performance.
It introduces a segment-level uncertainty metric that identifies out-of-distribution samples, supporting more reliable clinical decision-making.

Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation

This paper addresses a crucial issue in the domain of medical image segmentation using fully convolutional networks (FCNs), specifically U-Nets: the problem of confidence calibration and predictive uncertainty estimation. The authors focus on the tendency of FCNs to produce overconfident predictions, which undermines their reliability, particularly in medical applications where accuracy is paramount.

Key Contributions

The paper makes several significant contributions to the understanding of confidence calibration and uncertainty estimation in FCNs:

Loss Function Comparison: The authors conduct a systematic comparison between cross-entropy loss and Dice loss concerning segmentation quality and uncertainty estimation. They conclude that while FCNs trained with Dice loss tend to achieve better segmentation quality, cross-entropy provides better-calibrated predictions.
Model Ensembling for Confidence Calibration: A novel approach using model ensembling is proposed to address the calibration issues of FCNs trained with Dice loss. This ensemble method consistently improves both the segmentation quality and the calibration of predictive uncertainty estimates compared to single models trained with either loss function.
Segment-Level Uncertainty Estimation: The paper introduces a metric — the average entropy over the segmented object — to predict segmentation quality and identify out-of-distribution test examples effectively. This approach helps in discerning the model's ability to flag potentially challenging or unseen data inputs.
Applications and Evaluation: The authors evaluate their methods across three distinct medical image segmentation applications: brain, heart, and prostate MRI images. The experiments demonstrate substantial insights into predictive uncertainty estimation and highlight the utility of model ensembling in practical scenarios.

Implications and Future Directions

The implications of this research are multifaceted:

Practical Application: Accurate confidence calibration has immediate practical benefits, especially in clinical settings where erroneous overconfidence in models can lead to misdiagnosis or inappropriate treatment plans. The proposed ensembling method provides a practical solution for improving the reliability of deep learning models in medical imaging.
Theoretical Insights: The comparison of loss functions enriches the theoretical understanding of why specific losses are more conducive to reliable predictive uncertainty estimation, thereby influencing how future models might be designed or selected.
Expandability: While this paper focused on MRIs and three specific organs, the methods are generalizable and can be extended to other modalities and anatomical structures. Further research could also explore the potential for these techniques in other forms of medical imaging, such as CT scans.
Capturing Out-of-Distribution Samples: The successful estimation of segment-level predictive uncertainty indicates the viability of neural networks alerting clinicians when model predictions may be unreliable, calling for human intervention.

The paper’s methods pave the way for future works to refine calibration techniques further and investigate alternative ensemble strategies that may not require retraining from scratch, thus circumventing computational constraints. Additionally, examining the interplay between model calibration and sources of medical data uncertainty, such as inter-rater variability, could augment the robustness of segmentation models. In sum, this research forms a foundational step towards more reliable and trustworthy FCN-based medical image segmentation tools.

PDF Markdown

Related Papers

Tweets

https://twitter.com/s3nhs3nh/status/1744040269878587664