Papers
Topics
Authors
Recent
2000 character limit reached

Level- and Share-Based Loss Functions

Updated 20 November 2025
  • Level- and share-based loss functions are defined as measures on raw numeric data versus normalized proportions, guiding model evaluation in various learning contexts.
  • Level-based losses quantify direct discrepancies with metrics like MSE and MAE, while share-based losses assess categorical or compositional predictions using measures like CCE, BCE, or KL divergence.
  • In large samples, the asymptotic equivalence of these loss functions often leads to similar model rankings, though specific data characteristics may favor one approach over the other.

Level-based and share-based loss functions represent two foundational paradigms for quantifying prediction error in supervised learning and assessment of statistical models. Both classes take arguments that compare observed values (targets) and their realizations (predictions), but their construction, statistical underpinnings, and domains of application differ markedly. Level-based losses pertain to direct, numeric discrepancies between quantities, while share-based losses operate on normalized (proportional) representations, often reflecting categorical or compositional constraints. Rigorous statistical interpretation is provided by generalized linear model (GLM) theory, and recent research has clarified the precise conditions under which these loss formulations are asymptotically equivalent.

1. Definitions and Mathematical Framework

Let t=(t1,,tk)t=(t_1,\ldots,t_k) denote a vector of nonnegative target values, and y=(y1,,yk)y=(y_1,\ldots,y_k) their realized or predicted counterparts. In level-based analysis, losses are defined directly on the pairs (ti,yi)(t_i, y_i). In share-based analysis, targets and realizations are normalized: for totals St=i=1ktiS_t = \sum_{i=1}^k t_i and Sy=i=1kyiS_y = \sum_{i=1}^k y_i, shares are xi=ti/Stx_i = t_i/S_t and si=yi/Sys_i = y_i/S_y.

  • Level-based loss (at unit ii): Llevel(ti,yi)=(ti,yi)L_{\mathrm{level}}(t_i, y_i) = \ell(t_i, y_i).
  • Share-based loss (at unit ii): Lshare(ti,yi)=(xi,si)L_{\mathrm{share}}(t_i, y_i) = \ell(x_i, s_i).

For both, the loss function \ell can be instantiated as an absolute error, squared error, or more general function. A prominent subclass is the weighted exponentiated loss Lw,a(t,y)=w(t,y)tyaL_{w,a}(t, y) = w(t, y) \, |t-y|^a, where ww is a decomposable weight and a>0a > 0 (Coleman, 17 Nov 2025).

2. Level-Based Loss Functions: Statistical Interpretation and Use Cases

Level-based losses quantify the direct discrepancy between continuous-valued predictions and targets. Key members in this family include:

  • Mean Squared Error (MSE):

LMSE(y,y^)=1ni=1n(yiy^i)2.L_{\rm MSE}(y, \hat y) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat y_i)^2.

The statistical model assumes additive Gaussian noise: εi=yif(xi)N(0,σ2)\varepsilon_i = y_i - f(x_i) \sim \mathcal{N}(0, \sigma^2). The loss corresponds to the negative log-likelihood under this noise, and is associated with the identity link GLM for Gaussian outputs. The Bayes-optimal predictor is the conditional mean f(x)=E[yx]f^*(x) = \mathbb{E}[y | x] (Berzal, 7 Nov 2025).

  • Mean Absolute Error (MAE):

LMAE(y,y^)=1ni=1nyiy^i.L_{\rm MAE}(y, \hat y) = \frac{1}{n} \sum_{i=1}^n |y_i - \hat y_i|.

Here, the noise is Laplacian: εiLaplace(0,b)\varepsilon_i \sim \mathrm{Laplace}(0, b). MAE emerges as the MLE loss and the Bayes-optimal predictor is the conditional median f(x)=median(Yx)f^*(x) = \mathrm{median}(Y|x) (Berzal, 7 Nov 2025).

Level-based losses are standard in regression contexts with unbounded, real-valued outcomes. The loss function selection encodes a probabilistic model—Gaussian for MSE, Laplace for MAE—and thereby shapes both statistical efficiency and robustness to outliers.

3. Share-Based Loss Functions: Categorical and Proportional Structures

Share-based losses compare normalized distributions, such as class probabilities in multi-class classification or proportions in compositional data. The output domain is the probability simplex; typical loss functions and activations include:

  • Categorical Cross-Entropy (CCE):

LCCE(T,P^)=1ni=1nk=1Ktiklogp^ik,L_{\rm CCE}(T, \hat P) = -\frac{1}{n} \sum_{i=1}^n \sum_{k=1}^K t_{ik} \log \hat p_{ik},

where the prediction vector p^\hat p is obtained via softmax activation. This loss arises from the multinomial likelihood and targets convergence of the model distribution to the true class probabilities (Berzal, 7 Nov 2025).

  • Binary Cross-Entropy (BCE, K=2K=2):

LBCE(t,y^)=[tlogy^+(1t)log(1y^)],L_{\rm BCE}(t, \hat y) = -\bigl[t\log \hat y + (1-t)\log(1-\hat y)\bigr],

with sigmoid activation. The underlying statistical assumption is a Bernoulli likelihood (Berzal, 7 Nov 2025).

  • Kullback–Leibler (KL) Divergence:

LKL(p,p^)=k=1Kpklogpkp^k,L_{\rm KL}(p, \hat p) = \sum_{k=1}^K p_k \log \frac{p_k}{\hat p_k},

utilized when the target is a soft or empirical distribution.

Share-based losses are predominant in settings with bounded, compositional, or categorical targets. Their GLM interpretation is rooted in the exponential family with canonical (logit or generalized logit) link functions.

4. Asymptotic Equivalence of Level-Based and Share-Based Losses

Recent analysis delineates sufficient conditions under which level-based and share-based losses become almost surely equivalent in large samples. Considering i.i.d. pairs (Ti,Yi)(T_i, Y_i), for weighted exponentiated losses with decomposable weight functions: Lw,a(t,y)=w(t,y)tya,L_{w,a}(t, y) = w(t, y)\, |t - y|^a, the per-unit level- and share-based losses, averaged across nn units, satisfy (as nn \to \infty): 1ni=1nLi()(ST,nSY,n)a1ni=1ni()a.s.0,\frac{1}{n}\sum_{i=1}^n L_i^{(\ell)} - \left(\frac{S_{T,n}}{S_{Y,n}}\right)^a \frac{1}{n}\sum_{i=1}^n \ell_i^{(\ell)} \xrightarrow{a.s.} 0, under regularity conditions on moments, total stability, weights, and sparsity of large deviations. The ratio of summed level-based to share-based losses converges almost surely to K=(μT/μY)aK = (\mu_T / \mu_Y)^a, with μT=limST,n/n\mu_T = \lim S_{T,n}/n, μY=limSY,n/n\mu_Y = \lim S_{Y,n}/n (Coleman, 17 Nov 2025).

A direct corollary is that, for methods AA and BB, the rank ordering of their average level-based and share-based losses converges: rank orderings are asymptotically identical when totals are proportional. In particular, when μT=μY\mu_T = \mu_Y, the two loss types agree up to scale, and, in large samples, debates over optimizing one versus the other become moot in terms of model selection (Coleman, 17 Nov 2025).

5. Aggregated (Bag-Level) Versus Instance-Level Losses in Privacy and Learning

In privacy-sensitive and aggregate learning settings, responses are often available only at the aggregate (bag) level. For data partitioned into mm disjoint bags of size kk, observed responses are yˉa=(1/k)iBayi\bar y_a = (1/k)\sum_{i \in B_a} y_i.

Two principal loss formulations are:

  • Bag-Level Loss:

Lbag(θ)=1ma=1m(yˉa,1kiBafθ(xi))L_{\rm bag}(\theta) = \frac{1}{m} \sum_{a=1}^m \ell\left(\bar y_a, \frac{1}{k} \sum_{i \in B_a} f_\theta(x_i)\right)

  • Instance-Level Loss:

Linst(θ)=1mka=1miBa(yˉa,fθ(xi))L_{\rm inst}(\theta) = \frac{1}{mk} \sum_{a=1}^m \sum_{i \in B_a} \ell(\bar y_a, f_\theta(x_i))

For quadratic losses, LinstL_{\rm inst} is equivalent to LbagL_{\rm bag} plus a within-bag prediction variance penalty: Linst(θ)=Lbag(θ)+1mka=1mi,jBa(fθ(xi)fθ(xj))2.L_{\rm inst}(\theta) = L_{\rm bag}(\theta) + \frac{1}{mk}\sum_{a=1}^m \sum_{i,j\in B_a} (f_\theta(x_i)-f_\theta(x_j))^2. Thus, instance-level fitting regularizes model smoothness within bags, affecting bias and variance: bag-level losses deliver unbiased but higher-variance estimates, instance-level yield lower variance but introduce bias (Javanmard et al., 20 Jan 2024).

An interpolating estimator, Lρ=(1ρ)Lbag+ρLinstL_\rho = (1-\rho) L_{\rm bag} + \rho L_{\rm inst}, allows bias–variance trade-off by tuning ρ\rho; theoretical risk is characterized precisely in proportional high-dimensional regimes, facilitating optimal selection given bag size kk, data dimension, and privacy constraints (Javanmard et al., 20 Jan 2024).

6. Statistical Interpretation and GLM Connections

Both loss paradigms are rooted in the statistical principle of Maximum Likelihood Estimation (MLE) under specific output models. Level-based losses model continuous outcomes via additive noise (Gaussian, Laplace), with the GLM’s canonical link structure dictating identity activation. Share-based losses arise when outcomes are categorical or compositional, leveraging the softmax or sigmoid activation (logit or generalized logit link) (Berzal, 7 Nov 2025). The unified GLM view is captured as follows:

Outcome Model Canonical Link Activation Loss Function
Gaussian Identity Linear MSE
Laplace Identity Linear MAE
Bernoulli Logit Sigmoid BCE
Multinomial Generalized Logit Softmax CCE

The GLM framework extends to other distributions (e.g., Poisson + log link for counts, Gamma for skewed responses). The statistically justified loss/activation pair matches the target distribution and yields Bayes-optimal predictors (Berzal, 7 Nov 2025).

7. Practical Implications and Guidelines

Empirical and theoretical results indicate:

  • Level-based losses are preferred for unbounded, real-valued targets and symmetric, light-tailed noise; select MSE for mean-optimality, MAE for median-optimal robust regression.
  • Share-based losses are canonical for probability estimation, classification, and compositional data, where adherence to the simplex and categorical accuracy is required.
  • In large-sample settings under mild regularity without pathological outliers, choice between level-based and share-based loss for model selection is often inconsequential: both approaches provide nearly identical rankings and decisions (Coleman, 17 Nov 2025).
  • In aggregate learning and privacy regimes, leverage bias–variance trade-off via interpolation to optimize excess risk (Javanmard et al., 20 Jan 2024).

This suggests that, for the extensive class of decomposable, weighted exponentiated losses and sufficiently large data sets, the methodological and inferential differences between level-based and share-based losses become asymptotically negligible. Optimization, comparison, and reporting are thus justified under either framework when the sample size is large and totals align. For small samples, skewed data, or when interpretability of shares is intrinsically demanded, the practitioner’s preference or task constraints may guide the choice.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Level-Based and Share-Based Loss Functions.