Unified Focal Loss (UFL) Overview

Updated 16 December 2025

Unified Focal Loss is a unified loss framework that combines margin regularization and focal modulation to effectively weight hard and easy samples.
It offers multiple instantiations, including margin-based, hierarchical region-based, and relative-error scaling strategies for tasks like segmentation and depth regression.
UFL recovers popular loss functions as special cases, improving performance on imbalanced datasets while reducing hyperparameter complexity.

Unified Focal Loss (UFL) is a family of supervised loss functions extending the classical focal loss paradigm to a broader class of prediction problems and architectures by fusing distinct mechanisms for handling data imbalance and optimization difficulty. UFL has independently emerged in multiple forms in recent literature, with three notable instantiations: (1) a margin-regularized focal-weighted softmax loss for segmentation (Chen, 2023), (2) a hierarchical combination of distributional (focal) and region-based (Focal Tversky) losses for medical image analysis (Yeung et al., 2021), and (3) a relative-error-scaled focal loss for multi-view stereo depth regression with continuous targets (Peng et al., 2022). All variants share a unified goal: to jointly address hard/easy sample weighting and class/sample imbalance, and to serve as a generalization framework that recovers popular loss functions as special cases.

1. Mathematical Formulations and Core Principles

1.1 Margin-based Focal Loss (for Segmentation, Crack Detection)

Let $z_j$ denote the logit for class $j \in \{1, \dots, C\}$ , $y$ the ground-truth class ( $y \in \{1, \dots, C\}$ ), and $p_t = \frac{e^{z_y}}{\sum_{j=1}^C e^{z_j}}$ the standard softmax probability for the true class. The UFL loss in (Chen, 2023) introduces:

Margin-Softmax Regularizer: For a fixed margin $m \geq 0$ and scaling factor $s>0$ ,

$\hat z_j = \begin{cases} s\,(z_y + m), & j = y \ s\,z_j, & j \ne y \end{cases}$

$\hat p_t = \frac{\exp(s(z_y + m))}{\exp(s(z_y+m)) + \sum_{j\neq y} \exp(s z_j)} ,$

$L_{\text{reg}} = -\log \hat p_t ;$

Focal Modulation: For parameter $\gamma \geq 0$ ,

$L_{\text{foc}} = -(1-p_t)^\gamma \log p_t ;$

Unified Focal Loss (convex combination):

$L_{\text{UFL}} = \alpha L_{\text{reg}} + (1-\alpha) L_{\text{foc}}, \qquad \alpha \in [0,1] .$

1.2 Hierarchical Region and Distribution-Based UFL (Medical Segmentation)

Yeung et al. (Yeung et al., 2021) generalize focal and Dice-type losses under a single formulation:

Modified Focal Loss:

$\mathcal{L}_{mF}(p, y) = \delta\,(1-p_t)^{1-\gamma} \left[-\log(p_t)\right], \qquad p_t = \begin{cases} p, & y=1 \ 1-p, & y=0 \end{cases}$

Modified Focal Tversky Loss:

$\mathcal{L}_{mFT} = \sum_{c=1}^C (1 - \mathrm{mTI}_c)^\gamma,$

$\mathrm{mTI}_c = \frac{\sum_i p_{c,i} g_{c,i}}{\sum_i p_{c,i} g_{c,i} + \delta \sum_i p_{c,i} (1-g_{c,i}) + (1-\delta) \sum_i (1-p_{c,i})g_{c,i}},$

Unified Focal Loss (hierarchical):

$\mathcal{L}_{\mathrm{UFL}} = \lambda \mathcal{L}_{mF} + (1-\lambda) \mathcal{L}_{mFT} , \quad \lambda \in [0,1] .$

1.3 Relative-Error Focal Loss for Continuous Targets (Multi-View Stereo)

In the context of continuous, sparse soft labels $q \in [0,1]$ , (Peng et al., 2022) proposes:

Unified Focal Loss:

$\mathrm{UFL}(u, q) = \begin{cases} \alpha^+ [S_b^+(\frac{|q-u|}{q^+})]^\gamma \, \mathrm{BCE}(u, q), & q>0 \[6pt] \alpha^- [S_b^-(\frac{u}{q^+})]^\gamma \, \mathrm{BCE}(u, q), & q=0 \end{cases}$

where $q^+ = \max(q, \epsilon)$ , $\mathrm{BCE}(u, q) = -q\log u - (1-q)\log(1-u)$ , and $S_b^\pm$ are bounded, sigmoid-like scaling functions.

2. Hyperparameters, Special Cases, and Tuning

Across these frameworks, UFL introduces a compact hyperparameter set, each governing a specific operational facet:

Parameter	Role	Typical/Recommended Values
$m$ (margin)	Enforce class separation, rare-class robustness	$0.5$ ( $[0.2,1.5]$ for class imbalance)
$s$ (scale)	Amplify logits for effective margin application	$30$ ( $[10,64]$ )
$\gamma$ (focus)	Down-weight easy examples, focus on hard samples	$2.0$ or $[0.1,2]$ (task dependent)
$\alpha$ or $\lambda$	Weight region/distribution balance	$0.5$ (pure margin: $1$; pure focal: $0$)
$\delta$	FP/FN trade-off in region-based losses	$0.6$
$b$ (scaling base)	Bounds error scaling in regressional UFL	$5$
$\alpha^+,\alpha^-$	Positive/negative sample weighting (regression)	$\alpha^+=1$ , $\alpha^-$ stagewise (0.75→0.25)

Special cases include:

Focal Loss: $m=0$ , $\alpha=0$ (Chen, 2023); $\lambda=1$ with $\gamma>0$ , $\delta=0.5$ (Yeung et al., 2021)
Margin Softmax: $\gamma=0$ , $\alpha=1$ (Chen, 2023)
Dice / Tversky: $\gamma=0$ , $\delta=0.5$ , $\lambda=0$ (Yeung et al., 2021)
Binary FL: $q=0$ or $1$, $b\to\infty$ in (Peng et al., 2022)

3. Implementation and Pseudocode

The UFL losses are directly compatible with common deep learning frameworks (TensorFlow/Keras, PyTorch). Below is a high-level pseudocode for the margin-focal variant in segmentation (Chen, 2023):

for each minibatch:
    # Forward
    logits = network(input)
    # Margin-softmax modification
    zhat = s * logits
    zhat[:, target_class] += s * m
    prob_margin = softmax(zhat)
    prob_ce = softmax(logits)
    # Losses
    L_reg = -log(prob_margin[:, target_class])
    L_foc = -((1-prob_ce[:, target_class]) ** gamma) * log(prob_ce[:, target_class])
    # Combine
    L = alpha * L_reg + (1 - alpha) * L_foc
    # Backprop
    L.mean().backward()

Similar pseudocode applies for hierarchical and continuous label forms (Yeung et al., 2021, Peng et al., 2022); computation remains plug-and-play with automatic differentiation.

4. Empirical Evaluation and Task-Specific Results

Image Segmentation and Crack Detection

In crack segmentation tasks (DeepCrack-DB, PanelCrack), the UFL with $(m=0.5, \gamma=2.0)$ led to IoU gains of $+0.43$ on DeepCrack-DB (from $69.32$ to $69.75$) and $+0.44$ on PanelCrack (Chen, 2023), illustrating the dual benefits on class imbalance (focal term) and rare-class overfitting (margin term).

Medical Image Analysis

In five medical segmentation datasets:

CVC-ClinicDB (polyp), DSC improved to $0.909 \pm 0.023$ (UFL, asymmetric), outperforming Dice, Focal, Tversky, and Combo losses (Yeung et al., 2021).
Robust gains were observed in DRIVE, BUS2017, BraTS20, and KiTS19 across DSC and IoU, with all aUF improvements significant at $p<0.05$ .
UFL maintained stable performance across $\gamma \in [0.1, 0.9]$ and reduced the hyperparameter burden compared to earlier hybrid losses.

Multi-View Stereo

For DTU depth estimation, the relative-error UFL (Peng et al., 2022) reduced overall error to $0.320$ mm, outperforming both GFL and standard BCE. Ablation confirmed its robustness to scaling, hyperparameters, and stagewise weighting.

5. Mechanisms for Addressing Imbalance and Overfitting

UFL's empirical effectiveness stems from two core mechanisms, consistently present in all variants:

Focal reweighting $(1-p)^\gamma$ (or its continuous generalizations) down-weights loss contributions from easy (i.e., high-confidence or background-dominated) examples, thus directing gradients toward rare (minority) classes. This mitigates extreme foreground/background imbalances in pixel- or hypothesis-dense tasks.
Margin regularization (when included) further distances rare-class logits from the decision boundary, reducing overfitting and increasing robustness to annotation noise or scarce class occurrences.

In region-based UFL, the interplay of distributional and spatial (Tversky) terms ensures both pixelwise and regional balancing. Relative-error modulation in continuous targets (depth, keypoints) further amplifies errors on “hard” outliers, preventing loss domination by vast numbers of background or trivial samples.

6. Unification of Loss Function Families

UFL provides a principled framework that subsumes traditional loss functions:

Setting	Recovers
$m=0$ , $\gamma=0$	Cross-entropy
$m>0$ , $\gamma=0$	Margin softmax/ArcFace
$m=0$ , $\gamma>0$	Focal loss
$\lambda=0$ , $\gamma=0$ , $\delta=0.5$ (Yeung et al., 2021)	Dice loss
$\lambda=1$ , $\delta=0.5$ , $\gamma=0$	CE in Dice/focal-Dice
$q\in\{0,1\}$ , $b\to\infty$ (Peng et al., 2022)	Standard Focal loss
GFL (Generalized Focal Loss) (Peng et al., 2022)	UFL w/o relative scaling

This precise recoverability ensures that UFL can flexibly adapt to the demands of specific architectures, data regimes, or evaluation needs.

7. Practical Considerations and Recommendations

For imbalanced foreground-background tasks, select moderate to high $\gamma$ ($0.5$–$2$), enforce a nonzero margin $m$ when overfitting is observed, and tune $\alpha/\lambda$ to balance region and distribution effects (Chen, 2023, Yeung et al., 2021).
In continuous label and regression settings, maintain $b$ in $S_b^\pm$ within empirical ranges where relative-error scaling is effective but not prone to outlier explosion; set positive class weights to $1$ and decay negative weights or focusing parameters stage-wise as resolution increases (Peng et al., 2022).
UFL integrates seamlessly with categorical, regression, and hybrid region-based objectives; no architectural modification is needed, only the loss function is replaced.

UFL, in its multiple instantiations, consistently improves performance in dense prediction tasks—particularly under conditions of severe class or sample imbalance—while maintaining interpretability and compatibility with standard training pipelines (Chen, 2023, Yeung et al., 2021, Peng et al., 2022).