Papers
Topics
Authors
Recent
Search
2000 character limit reached

Asymmetric Loss (ASL) Overview

Updated 26 March 2026
  • Asymmetric Loss (ASL) is a loss function that penalizes over- and under-prediction differently, providing fine-grained control over error asymmetry.
  • It employs weighted minimization and tunable hyperparameters to capture domain-specific risk and improve robustness under data noise.
  • Empirical results in multilabel and segmentation tasks demonstrate improved mAP and noise tolerance compared to symmetric loss alternatives.

Asymmetric Loss (ASL) denotes a class of loss functions in statistical learning and optimization that introduce systematic imbalance—via hyperparameters or function form—in the way over-prediction and under-prediction, or positive vs. negative class errors, are penalized. This stands in contrast to symmetric losses, which assign equal cost to equivalent errors regardless of direction. The asymmetric loss paradigm provides a principled means to capture domain-specific risk sensitivities, improve learning under distributional imbalance (e.g., class or error-type), and enhance robustness under data or label noise.

1. Formal Definitions and Foundational Properties

Formally, let Y\mathcal{Y} denote the label space (e.g., for classification or regression), and L:C×YRL:\mathcal{C}\times\mathcal{Y}\to\mathbb{R} a loss, with C\mathcal{C} the prediction space (probability simplex, R\mathbb{R}, etc). Asymmetric loss functions can be characterized via a weighted minimization principle: given nonnegative weights w1,,wKw_1,\ldots,w_K with a unique maximizer tt (the "dominant class"),

argminuCi=1KwiL(u,i)=argminuCL(u,t).\arg\min_{u\in\mathcal{C}} \sum_{i=1}^K w_i L(u,i) = \arg\min_{u\in\mathcal{C}} L(u, t).

This property ensures that minimizing the expected asymmetric loss pushes probability mass (or predictions) preferentially toward the class or outcome with maximal weight, providing fine-grained control over error asymmetry (Zhou et al., 2021). Losses with this property are termed completely asymmetric when it holds for all weight choices, or strictly asymmetric when additionally the decrease in weighted loss is strict if the dominant class probability is increased.

Analytical properties of notable asymmetric loss families include continuity, convexity, and links to Bregman or ϕ\phi-divergences (e.g., the power-divergence loss, which interpolates between Kullback–Leibler, Pearson χ2\chi^2, and others via an explicit asymmetry parameter) (Pearse et al., 2024).

The asymmetry ratio,

r()=infu1,u20; u1+u21; 0Δu2(u1)(u1+Δ)(u2Δ)(u2)r(\ell) = \inf_{\substack{u_1, u_2\geq 0;\ u_1+u_2\leq 1;\ 0\leq\Delta\leq u_2}}\frac{\ell(u_1)-\ell(u_1+\Delta)}{\ell(u_2-\Delta)-\ell(u_2)}

quantifies how strongly the loss "pushes" mass from non-dominant to dominant classes. For weights wm>wnw_m>w_n, the requirement (wm/wn)r()1(w_m/w_n)\cdot r(\ell)\geq 1 is critical for ensuring asymmetry (Zhou et al., 2021).

2. Canonical Forms of Asymmetric Loss

Multiple forms of asymmetric loss have been derived or adapted for classification, regression, spatial prediction, and segmentation. Key representatives include:

Loss Family Formula/Definition (summary) Key Parameters
Piecewise-linear (quantile) L(ε)=k1εL(\varepsilon)=k_1\varepsilon if ε0\varepsilon\geq 0, k2ε-k_2\varepsilon if ε<0\varepsilon<0 k1,k2>0k_1, k_2>0
Power-divergence [PDL] LPDL,λ(δ,Y)=1λ(λ+1){Y[(Y/δ)λ1]+λ(δY)}L_{PDL,\lambda}(\delta,Y)=\frac{1}{\lambda(\lambda+1)} \{Y[(Y/\delta)^\lambda-1]+\lambda(\delta-Y)\} λR\lambda\in\mathbb{R}
Asymmetric Loss (ASL) LASL(p,y)=[(1p)γ+logp if y=1; (pt)γlog(1pt) if y=0]L_{ASL}(p,y)=-[ (1-p)^{\gamma_+}\log p \ \text{if}\ y=1;\ (p_t)^{\gamma_-}\log (1-p_t)\ \text{if}\ y=0 ] with pt=max{pt,0}p_t=\max\{p-t,0\} γ+,γ,t\gamma_+,\gamma_-, t
Tversky/Fβ_\beta Fβ(P,G)=(1+β2)pigi(1+β2)pigi+β2(1pi)gi+pi(1gi)F_\beta(P,G) = \frac{(1+\beta^2)\sum p_ig_i}{(1+\beta^2)\sum p_ig_i + \beta^2 \sum (1-p_i)g_i + \sum p_i(1-g_i)} β\beta (recall–precision tradeoff)
Polynomial-based Summations of multiple terms in (1y^)(1-\hat y)/y^t\hat y_t; see APL/RAL M,N,{αm},{βn}M,N,\{\alpha_m\},\{\beta_n\}
AMSE (classification) LAMSE(f(x),y)=1Kaeyf(x)22L_{AMSE}(f(x),y)=\tfrac{1}{K}\| a\mathbf{e}_y - f(x)\|_2^2 a1a\geq 1

The power parameter or exponents (e.g., λ\lambda, γ+\gamma_+, γ\gamma_-) control the direction and degree of penalty asymmetry (Pearse et al., 2024, Ben-Baruch et al., 2020).

3. Theoretical Guarantees and Robustness

Asymmetric losses possess several desirable properties for statistical learning:

  • Classification Calibration: For appropriately designed asymmetric losses (e.g., strictly and completely asymmetric), driving the excess loss risk to zero ensures vanishing excess $0$–$1$ risk.
  • Excess Risk Bounds: For loss \ell with (0)>(1)\ell(0)>\ell(1), the excess misclassification risk is bounded by the excess asymmetric loss risk: R01(f)R01[2/((0)(1))][R(f)R]R_{0-1}(f)-R_{0-1}^* \leq [2/(\ell(0)-\ell(1))][R_{\ell}(f)-R_{\ell}^*] (Zhou et al., 2021).
  • Noise Tolerance: Under clean-label-dominant noise (1ηy>maxkyηy,k1-\eta_y>\max_{k\neq y}\eta_{y,k}), any completely asymmetric loss is robust; the global minimum under the noisy risk coincides with the minimum under the clean risk (Wang et al., 23 Jul 2025, Zhou et al., 2021).
  • Variance Reduction: For piecewise-linear asymmetric losses, adding an optimal correction to the prediction not only minimizes mean asymmetric loss but also guarantees a strictly reduced variance of the loss unless the loss is symmetric (Yamaguchi et al., 2019).

4. Asymmetric Loss in Multilabel and Imbalanced Classification

In settings where positive instances are sparse relative to negatives (e.g., multilabel classification, long-tailed data), symmetric losses lead to gradient domination by negatives and poor fitting of positives. ASL decouples positive and negative exponentiation and introduces a hard threshold to entirely discard easy negatives, preserving scarce positive signal and enhancing performance.

Empirical performance comparisons:

  • On MS-COCO, default ASL (γ+=0,γ=4,t=0.05\gamma_+=0,\gamma_-=4,t=0.05) yields absolute mAP improvements over focal loss: 86.6%86.6\% vs. 85.1%85.1\% (Ben-Baruch et al., 2020).
  • On Open Images, macro mAP increases from 92.2%92.2\% (focal) to 92.8%92.8\% (ASL).
  • For medical long-tailed multi-label (CXR-LT), robust polynomial ASL with Hill regularization further improves mAP, mAUC, and F1 beyond BCE, focal, and plain ASL (Park et al., 2023).

The Hill loss regularization in robust ASL (RAL) caps gradients on hard negatives, preventing hyperparameter sensitivity/instability with polynomial-based asymmetric losses (Park et al., 2023).

5. Asymmetric Loss in Regression and Spatial Prediction

For positive-valued targets, standard symmetric error metrics do not match the natural multiplicative structure of errors or the cost structure in applications. The power-divergence family generalizes classical divergences with a tunable λ\lambda to encode cost asymmetry between under- and over-prediction:

  • λ=1\lambda=1: symmetric penalization (Pearson χ2\chi^2).
  • λ<1\lambda<1: over-prediction penalized more.
  • λ>1\lambda>1: under-prediction penalized more (Pearse et al., 2024).

Optimal estimators under LPDL,λL_{PDL,\lambda} often take the form of power means of the posterior target. Prediction intervals defined by the asymmetric loss can be computed analytically (for some λ\lambda) or numerically, providing tailored uncertainty quantification that matches the loss structure.

A quantitative measure A(f)=L((1f)Y,Y)/L((1f)1Y,Y)A(f)=L((1-f)Y,Y)/L((1-f)^{-1}Y,Y) expressed directly in terms of the loss enables selection and interpretation of the asymmetry magnitude for practical decision costs (Pearse et al., 2024).

6. Algorithmic and Implementation Considerations

The majority of asymmetric losses admit efficient implementation within standard autodiff frameworks. For example, ASL for multilabel tasks is given by:

1
2
3
4
5
6
def asymmetric_loss(logits, targets, gamma_pos=0, gamma_neg=4, margin=0.05, eps=1e-8):
    p = torch.sigmoid(logits)
    pos_loss = -((1 - p).pow(gamma_pos) * torch.log(p.clamp(min=eps))) * targets
    p_shift = torch.clamp(p - margin, min=0.0)
    neg_loss = -(p_shift.pow(gamma_neg) * torch.log((1 - p_shift).clamp(min=eps))) * (1 - targets)
    return (pos_loss + neg_loss).mean()
(Ben-Baruch et al., 2020)

Polynomial-based losses and their regularized variants (e.g., RAL) generalize this form, adding negligible computation overhead for typical polynomial degrees (M,N3M,N\leq 3) (Park et al., 2023).

The power-divergence estimator under a hierarchical spatial model uses posterior calculation of powers or logs; prediction intervals are constructed via quantile calculation or numerical root-finding (Pearse et al., 2024).

7. Applications, Extensions, and Empirical Results

  • Imbalanced medical segmentation: Asymmetric Tversky/Fβ_\beta losses improve recall at fixed or minor precision trade-off; in MS lesion segmentation, ASL increased recall by  4%~4\% and improved lesion-wise true positive rate over Dice and focal losses (Hashemi et al., 2018).
  • Learning with noisy labels: Asymmetric losses have been developed (AGCE, AUL, AMSE) and shown to provide enhanced robustness to both symmetric and class-conditional noise, outperforming symmetric counterparts on synthetic high-noise and real-world (WebVision, Clothing1M) benchmarks (Wang et al., 23 Jul 2025, Zhou et al., 2021).
  • Decision-theoretic corrections: Piecewise-linear asymmetric losses with explicit bias adjustment simultaneously minimize both mean and variance of downstream risk, providing a robust correction layer atop arbitrary predictors (Yamaguchi et al., 2019).
  • Spatial statistics: For positive-valued spatial prediction, asymmetric losses allow cost-aware tuning (e.g., via λ\lambda in PDL) to optimize credible interval width and bias–variance balance in spatial interpolation problems (Pearse et al., 2024).

Empirical evidence consistently demonstrates that increasing the asymmetry ratio aligns the loss with the preferred error structure (e.g., favoring recall or class robustness) and enhances performance under practical dataset imperfections (Zhou et al., 2021, Wang et al., 23 Jul 2025).


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Asymmetric Loss (ASL).