Tversky Loss Function

Updated 9 December 2025

Tversky Loss Function is a parameterized similarity metric for image segmentation that generalizes Dice and Jaccard indexes while explicitly controlling precision and recall.
It is implemented as a differentiable loss in deep networks, summing pixel-wise probabilities to efficiently handle multi-class and sparse object segmentation.
Empirical results in medical imaging show significant improvements in lesion, organ, and vessel segmentation when tuning its α and β parameters to balance false positives and negatives.

The Tversky loss function is a parameterized, differentiable similarity metric designed to address foreground–background imbalance in image segmentation tasks, particularly in medical imaging. By allowing explicit control over the relative penalties for false positives and false negatives, it generalizes classic overlap measures such as Dice and Jaccard and can be adapted to the precision/recall requirements of a given segmentation scenario. The loss’s flexibility and empirical performance in sparse-object segmentation have led to its widespread adoption in state-of-the-art deep learning pipelines for lesion, organ, and vessel segmentation.

1. Mathematical Definition and Core Properties

Let $A$ denote a predicted binary segmentation mask, %%%%1%%%% the ground-truth binary mask, and define

$\lvert A\cap B\rvert$ as the number of true positives (TP),
$\lvert A\setminus B\rvert$ as the number of false positives (FP),
$\lvert B\setminus A\rvert$ as the number of false negatives (FN).

The Tversky index is then

$T_{\alpha,\beta}(A,B) = \frac{\mathrm{TP}}{\mathrm{TP} + \alpha\,\mathrm{FP} + \beta\,\mathrm{FN}}$

with $\alpha, \beta \geq 0$ controlling the FP/FN tradeoff (Usman et al., 13 Feb 2025, Salehi et al., 2017, Roth et al., 2019).

The Tversky loss is

$\mathcal{L}_{\text{Tversky}} = 1 - T_{\alpha,\beta}(A,B)$

which is minimized during training. For soft segmentations, the sums are taken over pixel-wise probabilities and ground-truth labels.

Special cases:

$\alpha = \beta = 0.5$ recovers the Dice loss.
$\alpha = \beta = 1$ gives the Jaccard/IoU loss.
$\alpha + \beta = 1$ creates the family of $F_\beta$ scores (e.g., $F_2$ via $\alpha=0.2,\ \beta=0.8$ ).

2. Motivation: Imbalance and Precision–Recall Control

Segmentation tasks, especially in medical contexts, are often severely imbalanced: positive pixels (e.g., lesions, vessels, organs) typically occupy a small fraction of the image (Salehi et al., 2017, Jadon, 2020). Standard losses such as cross-entropy or Dice can result in models biased towards majority background, yielding poor recall for the clinically-relevant foreground class.

The motivation for the Tversky loss is to give practitioners direct, quantitative control over error types:

Penalty on false negatives ( $\beta$ ): Increasing $\beta$ emphasizes recall, guarding against missed lesions or small objects.
Penalty on false positives ( $\alpha$ ): Increasing $\alpha$ favors precision, reducing over-segmentation and false alarms.

This parameterization enables systematic exploration of the precision–recall frontier and supports task-specific optimization (e.g., $\alpha=0.3$ , $\beta=0.7$ to penalize missed lesions more severely (Usman et al., 13 Feb 2025, Salehi et al., 2017, Roth et al., 2019)).

3. Implementation in Deep Networks

The Tversky loss is typically implemented as a differentiable, batch-level function: $\mathcal{L}_{\text{Tversky}}(P,G;\alpha,\beta) = 1 - \frac{\sum_i p_i g_i}{\sum_i p_i g_i + \alpha \sum_i p_i (1-g_i) + \beta \sum_i (1-p_i)g_i}$ where $p_i \in [0,1]$ is the predicted probability for pixel $i$ , $g_i \in \{0,1\}$ the ground-truth label (Zhang et al., 4 May 2025, Salehi et al., 2017). A small smoothing constant $\varepsilon$ is often added to numerator/denominator for numerical stability.

Key computational properties:

Fully differentiable; compatible with auto-differentiation frameworks.
Computational complexity matches Dice/Jaccard losses.
For multi-class segmentation, the loss is computed per class and averaged or weighted accordingly.

Table: Special-case parameterizations for Tversky index

$\alpha$	$\beta$	Metric
0.5	0.5	Dice/F $_1$
1	1	Jaccard/IoU
$<0.5$	$>0.5$	Recall bias
$>0.5$	$<0.5$	Precision bias

4. Empirical Performance and Application Areas

In large-scale evaluations, the Tversky loss substantively improves segmentation of sparse or small foreground classes. Illustrative findings include:

Lesion segmentation (3D MRI): Switching from Dice to Tversky ( $\alpha=0.3$ , $\beta=0.7$ ) raised Dice coefficient from $\approx$ 0.57 to $\approx$ 0.63; $F_2$ score from $\approx$ 0.54 to $\approx$ 0.69; AUPRC from $\approx$ 0.62 to $\approx$ 0.78 (Salehi et al., 2017).
Liver lesions (slice-wise Tiramisu): Average Dice increases from 0.45 (Dice) to 0.57 (Tversky), with corresponding reduction in surface error metrics (Roth et al., 2019).
Pancreas segmentation (UNet-3D): Adaptive TverskyCE loss further raises Dice score by $\approx$ 9.5 points and improves F $_2$ by $\approx$ 9 points compared to static Tversky (Zhang et al., 4 May 2025).
Retinal vessel segmentation: Tversky loss yields highest AUC (0.9442 with SA-UNet), but Combo/Dice losses outperform in other metrics (Herrera et al., 2022).

The predominant area of application is medical imaging, including brain lesions, liver tumors, pancreas, nuclei, and retinal vessel segmentation. The loss’s explicit recall/precision weighting is critical in clinical contexts where under-segmentation is unacceptable (Usman et al., 13 Feb 2025, Salehi et al., 2017, Abraham et al., 2018, Das et al., 2020).

5. Extensions: Focal Tversky and Compound Losses

To amplify focus on hard-to-segment regions, the focal Tversky loss introduces a focusing exponent $\gamma$ : $\mathcal{L}_{\text{FTL}} = (1 - T_{\alpha,\beta})^\gamma$ with $\gamma > 1$ sharpening the loss to emphasize poorly segmented pixels (Abraham et al., 2018, Das et al., 2020). Empirically, settings such as $\alpha=0.3,\ \beta=0.7,\ \gamma=0.75$ further improve Dice, precision, recall, and regularize training for small-structure detection.

Compound losses combine Tversky with boundary-aware (HausdorffDT, surface distance) or cross-entropy terms, often with adaptive weighting (Usman et al., 13 Feb 2025, Zhang et al., 4 May 2025). For instance, Tversky-HausdorffDT or adaptive TverskyCE loss yield state-of-the-art Dice, surface distance, and robustness across varying class imbalances.

6. Parameter Selection and Practical Considerations

Tuning $(\alpha, \beta)$ is central for optimal performance:

Small foreground / high recall required: $\beta \gg \alpha$ (e.g., $\alpha=0.3$ , $\beta=0.7$ ).
Balanced classes: $\alpha = \beta = 0.5$ (pure Dice).
Precision priority: $\alpha > \beta$ . In practice, selection is data driven: grid search or validation on task-specific metrics (Dice, recall, surface error) (Salehi et al., 2017, Roth et al., 2019, Das et al., 2020, Usman et al., 13 Feb 2025). The addition of smoothing constants avoids instability when targets are absent. No special learning rate tuning is needed beyond standard Dice/Jaccard practice (Jadon, 2020).

7. Limitations, Comparative Insights, and Future Directions

Quantitative and qualitative assessments reveal certain drawbacks:

Tversky can cause over-segmentation (including background border artifacts) if $\beta$ is too large or if not paired with boundary regularization (Herrera et al., 2022).
Some works report Combo (cross-entropy + Dice) losses yielding more balanced performance across all metrics in vessel segmentation (Herrera et al., 2022).

A plausible implication is that optimal segmentation frequently relies on loss function ensembles, adaptive weighting, and context-aware tuning. Research continues to investigate the interplay of region-based and boundary-aware losses, the efficacy of dynamically weighted loss composition, and the extension to multi-class and volumetric segmentation in highly imbalanced regimes (Usman et al., 13 Feb 2025, Zhang et al., 4 May 2025, Abraham et al., 2018).

In summary, the Tversky loss is a principled, flexible, and empirically validated framework for segmentation tasks confronting substantial class imbalance and asymmetric error costs. Its adaptability to precision–recall tradeoffs—via $(\alpha, \beta)$ —and extensibility to focal and compound forms underpin its utility in contemporary medical image analysis workflows.

Markdown Upgrade to Chat

References (8)

Multimodal HIE Lesion Segmentation in Neonates: A Comparative Study of Loss Functions (2025)

Tversky loss function for image segmentation using 3D fully convolutional deep networks (2017)

Liver Lesion Segmentation with slice-wise 2D Tiramisu and Tversky loss function (2019)

A survey of loss functions for semantic segmentation (2020)

UNet-3D with Adaptive TverskyCE Loss for Pancreas Medical Image Segmentation (2025)

Impact of loss function in Deep Learning methods for accurate retinal vessel segmentation (2022)

A Novel Focal Tversky loss function with improved Attention U-Net for lesion segmentation (2018)

Convolutional Recurrent Residual U-Net Embedded with Attention Mechanism and Focal Tversky Loss Function for Cancerous Nuclei Detection (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tversky Loss Function.