L1-Weighted Dice Focal Loss

Updated 31 March 2026

L1-Weighted Dice Focal Loss is a composite loss function that integrates Dice, Focal, and dynamic L1 histogram weighting to address class imbalance in medical image segmentation.
The method adaptively reweights hard and rare voxels by quantifying per-voxel L1 error, thereby sharpening lesion boundaries and reducing false positives.
Empirical results in PSMA PET/CT imaging demonstrate improved Dice scores, higher F1 measures, and reduced false positives compared to traditional loss functions.

L1-Weighted Dice Focal Loss (L1DFL) is a composite loss function for deep neural network-based semantic segmentation, designed to address the challenges of class imbalance and subtle lesion boundary delineation in medical images. It combines region-based (Dice), distribution-based (Focal), and dynamic, error-driven (L1-norm histogram) weighting strategies to focus the optimization process on hard, rare, and clinically critical voxels. The central insight is to adaptively reweight the loss contribution of each voxel according to its L1 error and the empirical rarity of that error within the training batch, thereby improving detection and quantification performance on heterogeneous and low-prevalence lesion classes, particularly in prostate-specific membrane antigen (PSMA) PET/CT imaging applications (Dzikunu et al., 22 Apr 2025, Dzikunu et al., 4 Feb 2025).

1. Mathematical Formulation

The L1DFL is defined for binary segmentation of volumetric medical images. Given predicted probabilities $p_i(c) \in [0,1]$ and ground-truth $g_i(c) \in \{0,1\}$ at voxel $i$ for class $c \in \{0,1\}$ (background/lesion), its constituent terms are:

Dice Loss (squared denominator):

$\mathcal{L}_\mathrm{Dice} = 1 - \frac{2\sum_{c} \sum_{i} p_i(c)g_i(c) + \epsilon}{\sum_{c} \sum_i \left( p_i(c)^2 + g_i(c)^2 \right) + \epsilon}$

where $\epsilon$ is a smoothing constant (e.g., $10^{-6}$ ).

Focal Loss:

$\mathcal{L}_\mathrm{Focal} = -\frac{1}{2} \sum_c \sum_i \alpha_c (1 - p_i(c))^\gamma \log p_i(c)$

with $\alpha_0 = \alpha_1 = 1$ and $\gamma = 2$ .

L1-Norm Voxel-wise Weighting:

Compute per-voxel absolute error: $\Delta_i = |p_i - g_i|$ .
Bin all $\Delta_i$ into $K$ histogram bins of width $\kappa = 0.1$ ( $K \approx 10$ ).
For bin $B_k$ , compute its effective width $\lambda(B_k)$ and density $\mathcal{D}(B_k) = \frac{C(B_k)}{\lambda(B_k)}$ , where $C(B_k)$ is the count of $\Delta_i$ in $B_k$ .
Assign the bin weight $w(B_k) = N/\mathcal{D}(B_k)$ , where $N$ is the total voxel count.
Each voxel $i$ receives $w_i = w(B_k)$ corresponding to its $\Delta_i$ bin.

Weighted Dice Loss:

$\mathcal{L}_\mathrm{wDice} = 1 - \frac{2\sum_i w_i y_i p_i + \epsilon}{\sum_i w_i(y_i^2 + p_i^2) + \epsilon}$

(with $y_i$ the foreground label, $p_i = p_i(1)$ ).

L1-Weighted Dice Focal Loss:

$\mathcal{L}_\mathrm{L1DFL} = \mathcal{L}_\mathrm{wDice} + \mathcal{L}_\mathrm{Focal}$

This design up-weights rare, high-error voxels and down-weights the abundant, easy background, redistributing optimization focus.

2. Theoretical Motivation

Conventional Dice or Dice+Focal losses treat all voxels (or all classes) uniformly, resulting in optimization dominated by easy negatives and under-emphasis on hard, clinically significant boundaries or low-prevalence lesion voxels. Focal loss partially mitigates this by downweighting well-classified samples but does not account for error rarity distribution within each batch.

L1DFL directly measures per-voxel difficulty via $\Delta_i$ , forms a batch-specific histogram, and uses its empirical density to up-weight rare, hard-to-classify voxels. This adaptivity maintains strong overlap-based properties due to the Dice backbone, but explicitly shifts gradient mass towards underrepresented, clinically relevant, or ambiguous voxels—sharpening boundaries and reducing false positives (Dzikunu et al., 4 Feb 2025, Dzikunu et al., 22 Apr 2025).

The rationale is particularly pertinent to PSMA PET/CT lesion segmentation, where metastatic lesions are small, diffuse, and heterogeneous, causing classic losses to neglect the most diagnostically salient regions.

3. Implementation and Integration

Training Pipeline

Framework: MONAI + PyTorch.
Networks: U-Net, Attention U-Net, SegResNet.
Patching: Volumetric $128^3$ subvolumes; batch size set to fit a 16 GB V100 GPU.
Optimizer: AdamW (weight decay= $10^{-5}$ ).
Learning Rate: Start $2 \times 10^{-4}$ , cosine annealing to zero over 1000 epochs.
Loss Calculation Steps:

For each mini-batch, compute $p_i$ , $\Delta_i=|p_i-g_i|$ (focused on lesion class).
Bin $\Delta_i$ as described and compute $w_i$ .
Calculate $\mathcal{L}_\mathrm{wDice}$ and $\mathcal{L}_\mathrm{Focal}$ .
Take the sum as the batch scalar loss for backpropagation.

Inference: Sliding window ( $128^3$ ) with majority voting from cross-validation ensemble.

Preprocessing

CT preprocessing: Intensities clipped to [−1000, 3000] HU, scaled to [0,1].
PET preprocessing: Retained in raw SUV units.
Image resampling: All images to isotropic $2\,\text{mm}^3$ voxels.
Data augmentation: Patch cropping biased towards foreground, affine spatial transforms.

4. Empirical Performance and Benchmarking

L1DFL consistently demonstrated superior segmentation overlap, reduced false positives, and higher concordance to ground truth quantitative imaging metrics compared to Dice and Dice Focal Loss:

Model	Loss	Median DSC [IQR]	False Positives	F1 Score	p-value vs. L1DFL
Attention U-Net	Dice	0.58 [0.44,0.72]	2.06	0.50	<0.01
Attention U-Net	DFL	0.54 [0.33,0.73]	2.65	0.44	<0.01
Attention U-Net	L1DFL	0.66 [0.51,0.77]	0.42	0.69	—
SegResNet	Dice	0.60 [0.29,0.76]	0.73	0.62	0.24
SegResNet	DFL	0.59 [0.41,0.71]	2.05	0.49	0.015
SegResNet	L1DFL	0.68 [0.47,0.78]	0.52	0.66	—

Key findings:

Median Dice similarity coefficient (DSC) improvement: +13% vs. Dice, +22% vs. DFL on Attention U-Net; +13%/+15% on SegResNet.
F1 score increases: up to +19 points (0.50 → 0.69).
False positives reduced by 40–60%.
Lin’s concordance correlation coefficients (CCC): up to 0.99 for SUV_max and TLA, signifying strong clinical relevance (Dzikunu et al., 22 Apr 2025).
Only L1DFL consistently passed equivalence tests (TOST, ±20% margin, α=0.05) on key clinical metrics (SUV_max, SUV_mean, lesion count, TLA).
Coverage probabilities and total deviation indices were substantially better for L1DFL, indicating tighter agreement with ground truth and fewer outliers (Dzikunu et al., 4 Feb 2025, Dzikunu et al., 22 Apr 2025).

L1DFL maintained high performance across single vs. multiple lesions, wide tumor volume ranges, and varying lesion spread.

5. Robustness, Limitations, and Generalization

The adaptive L1-norm histogram weighting ensures the method is robust across backbone architectures—both Attention U-Net and SegResNet—contrasting with the network-specific variability observed with classic Dice and DFL. L1DFL’s direct measurement of prediction difficulty and rarity of error provides dynamic focus on critical regions without inflating false positives, a limitation observed in other Focal or Dice-based approaches.

Limitations:

Evaluation was restricted to a maximum of five lesions per scan; performance with higher lesion burdens remains untested.
Validation was performed specifically on prostate PSMA PET/CT data; extension to other imaging modalities, organs, or tracers may require re-parameterization.
Segmentation of lesions smaller than 1 mL and highly diffuse uptake remains challenging.
Current implementation uses a fixed bin width ( $\kappa=0.1$ ) and focal $\gamma=2$ ; adaptability of these hyperparameters is an open question.

Potential generalizations include per-epoch adaptive histograms, multi-norm weighting (e.g., $\ell_2$ or geodesic distances), and integration of prediction uncertainty into the voxel-weighting scheme (Dzikunu et al., 4 Feb 2025, Dzikunu et al., 22 Apr 2025).

6. Broader Context and Future Directions

L1DFL introduces a rigorous, empirically grounded strategy for addressing sample difficulty and error sparsity, operationalized through mini-batch L1-norm histograms. Its principled blend of region overlap, hard-sample focusing, and dynamic loss scaling is readily extensible to other volumetric or 2D segmentation domains with pronounced class imbalance or subtle boundary delineation challenges.

Future work may explore:

Adaptive or learnable bin widths.
Batch- or epoch-level dynamic normalization schemes for error density modeling.
Multi-class segmentation setups, requiring cross-class normalization.
Integration with boundary-aware regularizers or uncertainty modeling.

The design and empirical validation of L1DFL represent a substantial advance in clinical metric-concordant segmentation methodology, offering a template for robust loss engineering in high-variance, imbalanced datasets (Dzikunu et al., 22 Apr 2025, Dzikunu et al., 4 Feb 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Comprehensive Evaluation of Quantitative Measurements from Automated Deep Segmentations of PSMA PET/CT Images (2025)

Adaptive Voxel-Weighted Loss Using L1 Norms in Deep Neural Networks for Detection and Segmentation of Prostate Cancer Lesions in PET/CT Images (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to L1-Weighted Dice Focal Loss (L1DFL).