Adaptive TverskyCE Loss in 3D Segmentation

Updated 21 December 2025

The paper introduces Adaptive TverskyCE Loss as a composite, epoch-adaptive loss that dynamically weights Tversky and cross-entropy losses to enhance segmentation accuracy.
Experimental results on the NIH Pancreas-CT dataset show significant improvements in DSC and F2 scores, particularly with UNet-3D architectures.
The adaptive mechanism automatically prioritizes the more challenging loss component each epoch, reducing manual tuning and mitigating gradient oscillations.

Adaptive TverskyCE Loss is a data-driven, epoch-adaptive composite loss function that integrates the strengths of Tversky loss and cross-entropy loss with dynamically updated weights. Designed to optimize segmentation networks for challenging, highly imbalanced tasks—such as pancreas delineation in abdominal CT—it achieves automatic loss balancing through an epoch-wise mechanism based on recent performance. This approach leverages the complementary merits of overlap-based and classification-based objectives and was introduced in the context of 3D medical image segmentation employing UNet-3D architectures on the NIH Pancreas-CT dataset (Zhang et al., 4 May 2025).

1. Mathematical Foundation

The Adaptive TverskyCE Loss is composed of two primary objectives: the Tversky loss, which generalizes the Dice similarity coefficient via weighted penalties on false positives (FP) and false negatives (FN), and the cross-entropy (CE) loss, which targets voxel-wise classification accuracy.

Let $P = \{p_i\}$ denote the Softmax-predicted probability map and $G = \{g_i\}$ the binary ground truth ( $g_i \in \{0,1\}$ for each voxel $i$ ). The Dice similarity coefficient (DSC) is defined:

$D(P, G) = \frac{2 \sum_i p_i g_i}{\sum_i p_i + \sum_i g_i}$

The Tversky index extends this formulation:

$S_{\alpha,\beta}(P, G) = \frac{\sum_i p_i g_i}{\sum_i p_i g_i + \alpha \sum_i p_i (1 - g_i) + \beta \sum_i (1 - p_i) g_i}$

where $\alpha$ weights FP and $\beta$ weights FN errors. The Tversky loss is:

$L_{\text{Tversky}}(P, G; \alpha, \beta) = 1 - S_{\alpha,\beta}(P, G)$

Binary cross-entropy (BCE) loss is given by:

$L_{\text{BCE}}(P, G) = -\frac{1}{N} \sum_{i=1}^N \left[ g_i \log p_i + (1 - g_i) \log(1 - p_i) \right]$

The core contribution is the epoch-wise compositional loss:

%%%%10%%%%

with weights determined adaptively:

$w_T(t) = \frac{L_{\text{Tversky}}(t-1)}{L_{\text{Tversky}}(t-1) + L_{\text{BCE}}(t-1)}, \quad w_C(t) = \frac{L_{\text{BCE}}(t-1)}{L_{\text{Tversky}}(t-1) + L_{\text{BCE}}(t-1)}$

This ensures $w_T(t) + w_C(t) = 1$ at each epoch. The loss component with higher error in the preceding epoch is favored in the subsequent optimization.

2. Implementation Protocol

Implementation entails the following:

Weight Initialization: In the first epoch, $w_T(1)$ and $w_C(1)$ are initialized by evaluating the respective losses on a small batch and applying the same relative ratio as above. This requires a “warm-start” pass.
Epoch-wise Update: After each epoch, the current values of $L_{\text{Tversky}}$ and $L_{\text{BCE}}$ are recorded. The next epoch's weights are updated via the ratio rule.
Regularization: No auxiliary regularizers are introduced beyond maintaining $w_T, w_C \in [0,1]$ and $w_T + w_C = 1$ by construction.
Overhead: Aside from extra calculation of the current loss values after each epoch and their use to update the weights, there is no additional parameterization relative to the baseline models.

3. Network Architecture and Experimental Setup

The Adaptive TverskyCE Loss was evaluated on encoder-decoder architectures for 3D segmentation:

UNet-3D: Implements symmetric encoder-decoder with skip connections. Contracting path comprises consecutive $3\times3\times3$ conv blocks with ReLU, interleaved with $2\times2\times2$ max-pooling. The expanding path uses $2\times2\times2$ transposed conv operations, concatenation with encoder features, and further $3\times3\times3$ convolutions, finalizing with a $1\times1\times1$ conv and Softmax.
Dilated UNet-3D: Modifies UNet-3D by replacing the lowest-resolution conv block with a dilated convolution (kernel example $4\times4\times2$ , with dilation) to increase context.

Dataset and Preprocessing:

NIH Pancreas-CT: 80 contrast-enhanced CT scans, $512 \times 512$ , slice thickness 1.5–2.5 mm.
Splits: 56 training, 8 validation, 16 test volumes.
Preprocessing: intensity normalization, cropping/padding to $64\times64\times32$ volumes centered on the pancreas; no further augmentation reported.

Training Parameters:

Framework: PyTorch
Optimizer: Adam, starting learning rate 0.005, adaptive decay
Batch size: 10
Epochs: 150
Loss configurations: Tversky ( $\alpha=0.7,\beta=0.3$ ); Adaptive TverskyCE ( $\alpha=0.7,\beta=0.3$ and $\alpha=\beta=0.5$ )

4. Empirical Evaluation and Results

Quantitative Metrics

Performance metrics (DSC and F2 score) demonstrate the efficacy of the adaptive approach. All results below refer to final test performance:

Model/Setting	DSC	F2
UNet-3D + Tversky	76.11%	76.16%
Dilated UNet-3D + Tversky	72.84%	73.69%
UNet-3D + Adaptive TverskyCE ( $\alpha=0.7,\beta=0.3$ )	84.18%	83.75%
UNet-3D + Adaptive TverskyCE ( $\alpha=\beta=0.5$ )	85.59% (peak 95.24%)	85.14%
Dilated UNet-3D + Adaptive TverskyCE ( $\alpha=0.7,\beta=0.3$ )	80.90%	80.48%
Dilated UNet-3D + Adaptive TverskyCE ( $\alpha=\beta=0.5$ )	83.41%	82.85%

Relative gains: The optimal adaptive setting ( $\alpha=\beta=0.5$ ) for UNet-3D yields a +9.47% absolute DSC and +8.98% F2 increase over the baseline.

Additional Metrics

For UNet-3D + Adaptive TverskyCE ( $\alpha=\beta=0.5$ ):

Specificity: 99.97%
Sensitivity: 86.09%
Precision: 95.36%

Qualitative Results

Segmentations using Adaptive TverskyCE visually capture finer pancreatic boundaries, with a clear reduction in under-segmentation and over-segmentation compared to models trained with Tversky loss alone.

5. Optimization Behavior and Interpretation

The adaptive weighting schema ensures the learning process remains focused on the more challenging aspect—overlap (Tversky) or classification (CE)—at each epoch. Early training stages favor BCE for gradient stability and learning coarse structures, while, as convergence progresses, emphasis naturally shifts to Tversky, enhancing precise boundary overlap. This strategy counteracts the high gradient variance typical of overlap-based losses and maintains attention to the underrepresented (positive) foreground region.

A plausible implication is that this mechanism can mitigate gradient oscillations associated with pure overlap losses, which are sensitive to rare class occurrences, while eliminating the necessity for manual fusion weight tuning.

6. Practical Implications, Limitations, and Future Directions

Advantages: Adaptive TverskyCE requires no manual tuning of the fusion parameter, achieving substantial performance benefits on a highly imbalanced, small-organ segmentation problem. The implementation incurs negligible computational cost and integrates seamlessly with standard encoder-decoder segmentation frameworks.
Limitations: The loss adaption mechanism depends on epoch-level statistics, introducing potential lag if the loss surface changes rapidly. Initialization requires a reliable “warm start” estimate. Current evidence is limited to the pancreas segmentation domain.
Potential Extensions: The generalization of Adaptive TverskyCE to other datasets and targets (organs, modalities) remains to be established.

In sum, Adaptive TverskyCE Loss constitutes a principled, empirically validated loss function for medical image segmentation, dynamically balancing probabilistic voxel classification and global overlap, with significant documented improvements in accuracy and boundary localization on the NIH Pancreas-CT benchmark (Zhang et al., 4 May 2025).

PDF Markdown Chat (Pro)

References (1)

UNet-3D with Adaptive TverskyCE Loss for Pancreas Medical Image Segmentation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Adaptive TverskyCE Loss.