Optimization for Medical Image Segmentation: Theory and Practice when evaluating with Dice Score or Jaccard Index (2010.13499v1)

Published 26 Oct 2020 in eess.IV, cs.CV, and cs.LG

Abstract: In many medical imaging and classical computer vision tasks, the Dice score and Jaccard index are used to evaluate the segmentation performance. Despite the existence and great empirical success of metric-sensitive losses, i.e. relaxations of these metrics such as soft Dice, soft Jaccard and Lovasz-Softmax, many researchers still use per-pixel losses, such as (weighted) cross-entropy to train CNNs for segmentation. Therefore, the target metric is in many cases not directly optimized. We investigate from a theoretical perspective, the relation within the group of metric-sensitive loss functions and question the existence of an optimal weighting scheme for weighted cross-entropy to optimize the Dice score and Jaccard index at test time. We find that the Dice score and Jaccard index approximate each other relatively and absolutely, but we find no such approximation for a weighted Hamming similarity. For the Tversky loss, the approximation gets monotonically worse when deviating from the trivial weight setting where soft Tversky equals soft Dice. We verify these results empirically in an extensive validation on six medical segmentation tasks and can confirm that metric-sensitive losses are superior to cross-entropy based loss functions in case of evaluation with Dice Score or Jaccard Index. This further holds in a multi-class setting, and across different object sizes and foreground/background ratios. These results encourage a wider adoption of metric-sensitive loss functions for medical segmentation tasks where the performance measure of interest is the Dice score or Jaccard index.

Citations (229)

View on Semantic Scholar

Summary

The paper establishes that metric-sensitive loss functions, such as soft Dice and soft Jaccard, significantly outperform traditional cross-entropy in optimizing segmentation performance.
It provides a rigorous theoretical analysis demonstrating that Dice and Jaccard metrics reliably approximate each other under risk minimization frameworks.
Empirical validation across six medical segmentation tasks confirms the superiority of metric-sensitive losses even in multi-class settings and challenging data conditions.

Overview of Medical Image Segmentation Optimization Paper

The paper "Optimization for Medical Image Segmentation: Theory and Practice when evaluating with Dice Score or Jaccard Index" explores the theoretical and empirical aspects of optimizing medical image segmentation using metric-sensitive loss functions, particularly focusing on the Dice score and Jaccard index. The research addresses the discrepancy commonly observed where learning-based segmentation methods use per-pixel loss functions such as cross-entropy, despite evaluations being made using the Dice score or Jaccard index.

Theoretical Analysis

Theoretically, the paper investigates the relationship between the Dice score and Jaccard index, noting that they approximate each other relatively and absolutely, thus validating their use interchangeably under risk minimization frameworks. The paper further establishes that cross-entropy and its weighted variants do not approximate the Dice or Jaccard indices sufficiently well, and no suitable weighting scheme for cross-entropy loss can perfectly surrogate these metrics at test time. The authors also analyze the Tversky index, finding that its approximation of the Dice score deteriorates as the weighting diverges from equality, demonstrating the superiority of metric-sensitive losses like soft Dice and soft Jaccard for optimizing toward these evaluation metrics.

Empirical Validation

The authors conduct extensive empirical validation over six medical segmentation tasks, confirming the theoretical results. The tasks cover various medical imaging modalities and applications, providing a broad validation context. Consistently, it is observed that metric-sensitive losses surpass cross-entropy in performance when evaluated using the Dice score or Jaccard index, regardless of class imbalance or object size within the dataset. Notably, the experiments extend the validation to multi-class segmentation settings, reaffirming that metric-sensitive losses maintain superior performance across individual sub-regions of a larger structure, such as different glioma sub-regions in brain MRI datasets.

Implications and Future Directions

The paper indicates a strong recommendation for the adoption of metric-sensitive loss functions in medical image segmentation tasks where the Dice score or Jaccard index is the primary metric of evaluation. This transition is encouraged notwithstanding the dataset characteristics, including class imbalance ratios or segmentation of differing object sizes. While the paper highlights that sDice and sJaccard perform comparably, the choice of specific metric-sensitive loss remains flexible without significant impact on performance, which simplifies the decision-making process for researchers and practitioners.

Conclusion

In conclusion, the paper provides a rigorous theoretical foundation combined with substantive empirical evidence supporting the use of metric-sensitive loss functions over traditional cross-entropy for medical image segmentation tasks. This work makes a compelling case for this optimization approach, presenting both methodological insights and practical guidelines for improving segmentation outcomes in healthcare applications. Future research could explore the integration of these findings with emerging deep learning architectures and investigate the benefits across other challenging image modalities.

PDF Markdown