Generalised Dice Loss for Segmentation
- Generalised Dice Loss (GDL) is a loss function that extends the traditional Dice coefficient with dynamic, per-class weighting to address severe class imbalance in image segmentation.
- It uses an inverse squared-volume weighting scheme to amplify errors for rare classes, ensuring balanced gradient contributions during optimization.
- Empirical evaluations demonstrate that GDL improves segmentation accuracy in challenging medical imaging tasks, outperforming standard loss functions on small and infrequent structures.
Generalised Dice Loss (GDL) is a deep learning loss function designed to address severe class imbalance in multi-class image segmentation, particularly in medical images where rare structures are often encountered. GDL extends the classical Dice coefficient by introducing dynamic, per-class rebalancing through inverse squared-volume weighting, thereby ensuring robust optimization and balanced gradient contributions across classes, regardless of their frequency or size (Sudre et al., 2017).
1. Mathematical Formulation
The Generalised Dice Loss is defined for an input patch or image with classes (including background) and voxels (or pixels). For each class and voxel :
- is the one-hot ground-truth label for voxel in class
- is the predicted probability for voxel and class
- 0 is the class-specific weight
The GDL is given by:
1
In compact notation:
2
where 3 and 4.
2. Class-Rebalancing and Weighting Scheme
GDL introduces a class rebalancing mechanism by assigning each class a weight inversely proportional to the squared volume of the ground-truth for that class:
5
where 6 (typically 7) is used to prevent division by zero. This “inverse-volume²” approach ensures that small classes receive larger weights, so their segmentation error is amplified to match the influence of larger, more prevalent classes. As a result, the bias of standard Dice toward large regions is mitigated and per-class gradient magnitudes are balanced. This weighting is dynamic and recalculated per batch during training.
3. Theoretical Properties and Comparative Motivation
GDL is characterized by several theoretical and practical advantages over alternative loss functions such as weighted cross-entropy and standard Dice loss:
- Class-balanced overlap: GDL automatically ensures every class—including rare ones—contributes equally to the loss within each batch, adapting class weights according to their occurrence.
- Robustness to imbalance: Weighted cross-entropy may excessively upweight small classes or produce vanishing gradients. Sensitivity–specificity losses require manual trade-off parameter tuning. GDL, by contrast, achieves balanced optimization without manual class-specific tuning.
- Scale invariance: Because region sizes can vary greatly across samples, GDL ensures that the contribution of each class to the loss remains approximately constant.
- Gradient stability: Empirical analysis shows GDL gradients remain well-conditioned even for regions comprising less than 0.1% of voxels, a regime where other losses often become unstable or lead to slow convergence.
4. Empirical Evaluation
Experiments were conducted on both 2D and 3D segmentation tasks involving extreme class imbalance:
- 2D Segmentation: The BRATS brain tumor segmentation dataset was used, where tumor pixels could constitute as little as 0.5% of a patch. Architectures tested included UNet and TwoPathCNN. GDL consistently yielded the highest (or near-highest) Dice Similarity Coefficient (DSC) across learning rates and patch sizes. For example, at learning rate 8 (small patch), UNet achieved DSCs: DL₂=0.84, SS=0.82, WCE=0.83, GDL=0.85.
- 3D Segmentation: An in-house white-matter hyperintensity dataset with 524 subjects was evaluated using DeepMedic and HighResNet. Lesion volumes were often less than 0.02% of the patch. GDL performed best, and was uniquely robust to learning-rate selection and class imbalance (HighResNet: GDL=0.65 vs. DL₂=0.62, SS=0.58, WCE non-convergent at LR 9, large patch).
Test-set evaluation with HighResNet on the 3D WMH task yielded median DSCs of 0.66 for GDL (best), compared to 0.63 (DL₂), 0.60 (SS), with WCE failing to converge under these conditions.
5. Practical Implementation
GDL is implementable as a batch-wise loss computed over predicted probability maps (typically softmax outputs) and one-hot ground-truth labels. Efficient vectorized computation, as described below, is critical for workflow integration:
7 Training protocols recommend:
- Optimizer: SGD or Adam, with a learning rate of 0 providing stable trade-off between speed and convergence
- Batch size: maximize within GPU constraints; ensure all classes are present at least once per batch to prevent weight blow-up
- Patch size: smaller patches and larger batch sizes yield more stable class-volume estimates and improved convergence
6. Limitations and Best Practices
- Extreme imbalance instability: When a class is absent from a batch (i.e., class volume approaches zero), its weight 1 diverges. Mitigations include adding a numerical 2, explicitly capping 3, or enforcing sampling to ensure each class is present in every batch.
- Increased memory overhead: Because GDL requires global class sums per batch, small batch sizes may introduce estimation noise into the weights and loss.
- Hybrid loss strategies: Early training may benefit from combining GDL with cross-entropy (4 with 5 in 6) to stabilize optimization.
- Monitoring: Per-class Dice scores should be tracked throughout training to avoid collapse on rare classes.
- Data augmentation: Recommended especially for high anatomical variability, to prevent overfitting the class-weighted overlap criterion.
7. Comparative Performance and Qualitative Findings
GDL outperformed standard Dice and sensitivity–specificity losses in recovering small, “punctate” structures and reducing false negatives, especially on challenging 3D cases. Networks trained with standard Dice or sensitivity–specificity loss tended to miss fine lesion details or yield overly smooth predictions. By contrast, GDL-optimized models produced more accurate and detailed segmentation masks, exhibiting higher fidelity in recovering rare pathological structures even in the presence of extreme class imbalance (Sudre et al., 2017).