Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation: Theory & Practice (1911.01685v1)

Published 5 Nov 2019 in cs.CV, cs.LG, and eess.IV

Abstract: The Dice score and Jaccard index are commonly used metrics for the evaluation of segmentation tasks in medical imaging. Convolutional neural networks trained for image segmentation tasks are usually optimized for (weighted) cross-entropy. This introduces an adverse discrepancy between the learning optimization objective (the loss) and the end target metric. Recent works in computer vision have proposed soft surrogates to alleviate this discrepancy and directly optimize the desired metric, either through relaxations (soft-Dice, soft-Jaccard) or submodular optimization (Lov\'asz-softmax). The aim of this study is two-fold. First, we investigate the theoretical differences in a risk minimization framework and question the existence of a weighted cross-entropy loss with weights theoretically optimized to surrogate Dice or Jaccard. Second, we empirically investigate the behavior of the aforementioned loss functions w.r.t. evaluation with Dice score and Jaccard index on five medical segmentation tasks. Through the application of relative approximation bounds, we show that all surrogates are equivalent up to a multiplicative factor, and that no optimal weighting of cross-entropy exists to approximate Dice or Jaccard measures. We validate these findings empirically and show that, while it is important to opt for one of the target metric surrogates rather than a cross-entropy-based loss, the choice of the surrogate does not make a statistical difference on a wide range of medical segmentation tasks.

Citations (231)

View on Semantic Scholar

Summary

The paper presents theoretical findings demonstrating that no cross-entropy formulation can accurately approximate the Dice and Jaccard scores.
It empirically validates that metric-sensitive losses such as soft Dice, soft Jaccard, and Lovász outperform traditional cross-entropy losses in diverse medical segmentation tasks.
The study’s results imply that adopting metric-sensitive loss functions aligns training objectives with evaluation metrics, meriting further research.

Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation

The paper "Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation: Theory and Practice" investigates the optimization of common evaluation metrics - Dice Score and Jaccard Index - in convolutional neural networks (CNNs) for medical image segmentation tasks. Traditionally, segmentation networks are optimized using cross-entropy (CE) losses, which can create a discrepancy between the optimization objective during training and the evaluation criterion at test time.

Theoretical Insights

The paper begins by exploring the theoretical differences in optimizing segmentation tasks through risk minimization. It challenges the existence of a weighted cross-entropy formulation that could inherently approximate either the Dice Score or Jaccard Index. The analysis identifies relative approximation bounds between these metrics, establishing that Dice and Jaccard approximate each other with a relative error of one and no cross-entropy optimization can approximate these measures accurately given the theoretical bounds and discrepancies revealed.

Empirical Evaluation

The authors empirically validate these findings across five medical segmentation tasks, utilizing five loss functions: cross-entropy (CE), weighted cross-entropy (wCE), soft Dice (sDice), soft Jaccard (sJaccard), and Lovász-softmax. The results demonstrate that metric-sensitive losses (sDice, sJaccard, and Lovász) significantly outperform CE and wCE in terms of Dice Score and Jaccard Index across various datasets, indicating that directly optimizing these metrics yields better segmentation performance over a range of medical datasets.

Implications and Future Directions

The empirical results support the theoretical findings but also reflect no significant statistical difference among the metric-sensitive losses themselves across diverse tasks. This suggests that while the choice of metric-sensitive loss improves performance over CE-based optimization significantly, it may not need tuning specific to the dataset or task at hand.

The analysis further highlights potential underperformance of wCE due to task-specific needs for weighting, which wCE cannot universally address across datasets. Given the complexity of medical image segmentation, this paper encourages broader adoption and further research into metric-sensitive losses like Dice and Jaccard in lieu of traditional per-pixel loss approaches, thereby aligning optimization objectives more closely with evaluation metrics.

As the field progresses, the insights gleaned from this paper could inspire novel loss functions that better capture the nuanced requirements of medical image analysis, potentially incorporating adaptive approaches that cater to class imbalance and specific dataset characteristics dynamically during training.

PDF Markdown