Papers
Topics
Authors
Recent
2000 character limit reached

Sum of Ranked Range Loss

Updated 19 December 2025
  • SoRR loss is a rank-based aggregator that generalizes traditional loss functions by summing losses only over a designated ranked range, enhancing robustness.
  • It applies a two-parameter (m, k) selection to sorted losses, effectively discarding extreme outliers and easy samples to target a mid-range of difficulty.
  • Leveraging DC decomposition, SoRR supports efficient optimization via methods like DCA, Moreau smoothing, and proximal ADMM, with strong empirical performance.

The Sum of Ranked Range (SoRR) loss is a class of rank-based aggregators for supervised learning objectives that generalizes several standard loss aggregation paradigms. SoRR operates by applying a two-parameter (mm, kk) selection to the sorted list of individual losses, summing only the losses ranked from the (m+1)st(m+1)^{\text{st}} to kthk^{\text{th}} largest. This approach enables robust, focused, and adaptive loss construction, discarding both extreme outliers and “easy” samples, and interpolating between empirical risk minimization, maximum loss, top-kk losses, trimmed means, and related objectives. SoRR can be expressed as the difference of convex (DC) functions and admits specialized optimization via DC algorithms and modern stochastic methods. It has found utility in aggregate sample-level objectives for classification (AoRR) and label-level objectives in multi-label and multi-class learning (TKML), with proven robustness and calibration properties across noise and outlier scenarios (Hu et al., 2021, Hu et al., 2020, Hu et al., 2022, Yao et al., 2022, Xiao et al., 2023).

1. Mathematical Definition and Ranked Range Selection

Let S={s1,...,sn}S = \{ s_1, ..., s_n \} denote a collection of real-valued losses (or scores), typically arising from evaluating a model on nn samples or nn labels for one sample. Arrange SS in non-increasing order:

s[1]s[2]s[n]s_{[1]} \geq s_{[2]} \geq \cdots \geq s_{[n]}

For integers 0m<kn0 \leq m < k \leq n, define the ranked range Rm+1:k={s[m+1],...,s[k]}R_{m+1:k} = \{ s_{[m+1]}, ..., s_{[k]} \}. The SoRR operator then aggregates these as:

ψm,k(S)=i=m+1ks[i]\psi_{m,k}(S) = \sum_{i=m+1}^{k} s_{[i]}

When normalized, the corresponding average is termed AoRR:

LAoRR(S;m,k)=1kmi=m+1ks[i]L_{\text{AoRR}}(S;m,k) = \frac{1}{k-m} \sum_{i=m+1}^k s_{[i]}

Parameter choices interpolate between:

  • Full average (m=0m=0, k=nk=n)
  • Top-kk loss (m=0m=0, k=kk=k)
  • Median (mn/2m \approx n/2)
  • Max loss (m=0m=0, k=1k=1)
  • Trimmed mean (m>0m>0, k<nk<n)

By construction, SoRR discards the mm worst losses and the nkn-k easiest, focusing training on a mid-range of difficulty. This selection scheme equips SoRR-based losses with enhanced robustness to sample and label outliers (Hu et al., 2021, Hu et al., 2022).

2. Structural Properties: Decomposition and Nonconvexity

SoRR is not convex in general. However, it is DC (difference of convex): the sum over any ranked range can be written as

ψm,k(S)=i=1ks[i]i=1ms[i]=ϕk(S)ϕm(S)\psi_{m,k}(S) = \sum_{i=1}^k s_{[i]} - \sum_{i=1}^{m} s_{[i]} = \phi_{k}(S) - \phi_{m}(S)

where

ϕn(S)=i=1ns[i]\phi_{n}(S) = \sum_{i=1}^{n} s_{[i]}

is known to be convex in SS (for convex si()s_i(\cdot)). Each ϕk\phi_k admits a dual representation (Hu et al., 2020, Yao et al., 2022):

ϕk(S)=minλR{kλ+i=1n[siλ]+}\phi_k(S) = \min_{\lambda \in \mathbb{R}} \left\{ k\lambda + \sum_{i=1}^n [s_i - \lambda]_+ \right\}

Consequently, SoRR minimization is amenable to difference-of-convex optimization techniques, notably the DCA (Difference-of-Convex Algorithm), which alternates linearization of the concave term and convex minimization (Hu et al., 2021, Hu et al., 2020, Yao et al., 2022, Xiao et al., 2023).

3. Optimization Strategies: DCA, Smoothing, and ADMM

DC Optimization: DCA iterates by computing a subgradient of the “lower” convex term (ϕm\phi_m) and minimizing the “upper” convex term (ϕk\phi_k) penalized by this subgradient. The procedure is as follows:

  1. Compute utϕm(S(θt))u^t \in \partial \phi_m(S(\theta^t))
  2. Update θt+1=argminθ{ϕk(S(θ))ut,θ}\theta^{t+1} = \arg \min_{\theta} \left\{ \phi_k(S(\theta)) - \langle u^t, \theta \rangle \right\}

This can be implemented via stochastic subgradient descent, with inner updates for the threshold variables λ,μ\lambda, \mu controlling the ranked range (Hu et al., 2021, Hu et al., 2020).

Moreau Smoothing & Stochastic Block-Coordinate Methods: For large-scale and deep learning scenarios, SoRR minimization has been enhanced by introducing smooth surrogates using Moreau envelopes for both convex components, yielding the complexity bound O~(1/ϵ6)\tilde O(1/\epsilon^6) for ϵ\epsilon-critical points (Yao et al., 2022). Proximal SGD with block-coordinate updates is employed to approximate the proximal maps associated with ϕk\phi_k and ϕm\phi_m.

Proximal ADMM Framework: A more recent unified optimization scheme embeds SoRR in a proximal ADMM framework. By introducing auxiliary variables and constraints, the SoRR-augmented objective is split into zz (sorted loss variable), ww (model parameter), and λ\lambda (dual variable) blocks. Each block is updated in closed form or via convex subproblems, with convergence rate O(1/ϵ2)O(1/\epsilon^2) per KKT residual (Xiao et al., 2023).

Algorithm Key Features Complexity/Notes
DCA DC structure, subgradients Guaranteed monotonic descent
Moreau SGD Smooth surrogates, block updates O~(1/ϵ6)\tilde O(1/\epsilon^6) accuracy
Proximal ADMM Auxiliary splitting, PAVA O(1/ϵ2)O(1/\epsilon^2) KKT residual

4. SoRR-based Aggregators: AoRR and TKML

AoRR (Average of Ranked Range)

For nn training examples, AoRR defines:

LAoRR(θ;m,k)=1kmi=m+1ks[i](θ)L_{\text{AoRR}}(\theta; m, k) = \frac{1}{k - m} \sum_{i=m+1}^k s_{[i]}(\theta)

AoRR generalizes classic metrics; by omitting up to mm worst-case losses, it confers robustness to sample corruption. This structure calibrates binary classification under mild surrogate conditions, recovering the Bayes rule in infinite-data limits. In the large-sample regime, AoRR approximates a difference of two CVaR risk functionals:

LAoRRνCVaRν[s]μCVaRμ[s]L_{\text{AoRR}} \simeq \nu\, \text{CVaR}_\nu[s] - \mu\, \text{CVaR}_\mu[s]

with nonasymptotic concentration bounds (Hu et al., 2021).

TKML (Top-kk Multi-Label Loss)

For multi-label (or multi-class) problems, TKML ranks per-label predictions and penalizes when true labels drop below rank kk. Formally, for xx, label set YY:

s(x,Y;Θ)=[1+θ[k+1]xminyYθyx]+s(x, Y; \Theta) = [ 1 + \theta_{[k+1]}^\top x - \min_{y \in Y} \theta_y^\top x ]_+

This is the (k+1)(k+1)-th largest margin-gap, as a (k,k+1)(k,k+1)-ranked range. TKML lower bounds the conventional margin loss and generalizes known top-kk consistent SVM losses (Hu et al., 2021, Hu et al., 2020).

Combined Loss (AoRR-TKML): For simultaneous robustness against sample and label-level outliers, TKML can be aggregated via AoRR, unifying approaches from iterative trimming and doubly stochastic mining (Hu et al., 2021).

5. Subgradient, Differentiability, and Clarke Regularity

SoRR is piecewise linear in its arguments. The subgradient with respect to an individual loss i\ell_i is:

LSoRRi={1,if iranked range m+1ik 0,otherwise\frac{\partial \mathcal{L}_{\mathrm{SoRR}}}{\partial \ell_i} = \begin{cases} 1, & \text{if } \ell_i \in \text{ranked range } m+1 \leq i \leq k \ 0, & \text{otherwise} \end{cases}

For differentiable surrogates (e.g., logistic, hinge), the chain rule allows for efficient gradient propagation restricted to “in-range” samples. SoRR is continuous, locally Lipschitz, nearly everywhere differentiable, and possesses a well-defined Clarke subdifferential under mild monotonicity and convexity conditions (Hu et al., 2022, Xiao et al., 2023).

6. Empirical Evaluation and Computational Complexity

Empirical studies on synthetic and real-world datasets have established the robust performance gains of SoRR-based losses (Hu et al., 2021, Hu et al., 2020, Yao et al., 2022, Xiao et al., 2023):

  • Aggregate-loss robustness: AoRR yields near-Bayes optimal boundaries in simulations with severe outliers and consistently outperforms average, max, and top-kk losses on UCI benchmarks with label noise.
  • Multi-label ranking: TKML and TKML-AoRR gain several percentage points in top-kk accuracy and average precision on multi-label datasets, significantly outperforming one-vs-all logistic and standard rank-based objectives under asymmetric and symmetric label noise.
  • Optimization cost: Main overhead is sorting per mini-batch (O(nlogn)O(n \log n)), or O(n)O(n) with selection + merge. ADMM/PAVA approaches enable efficient handling of large nn, and SoRR admits provable complexity bounds in SGD-based settings.
  • Comparisons: SoRR markedly enhances robustness relative to average loss, max loss, and smooth surrogates, at negligible optimization overhead.

7. Extensions, Open Problems, and Applications

  • Parameter selection: Choosing (m,k)(m, k) is application-dependent; setting mm equal to the outlier budget and tuning kk by validation yields stable improvement, but fully adaptive schemes remain an open problem.
  • Nested/rank-based aggregators: Stacking SoRR within multi-label or further aggregating at sample/label levels is feasible; this suggests general unifying frameworks for loss construction.
  • Statistical learning theory: Generalization guarantees and VC/Rademacher bounds for SoRR-aggregated objectives are under-explored.
  • Optimization: Fast approximate and exact solvers (stochastic DCA, ADMM-PAVA) are under active development.
  • Risk management connection: The link between SoRR and differences of CVaR terms motivates joint modeling with distributionally robust optimization.

SoRR loss enables flexible, robust, and interpretable objective design, supported by efficient optimization methodologies and strong empirical results in noisy, heavy-tailed, and imbalanced learning scenarios (Hu et al., 2021, Hu et al., 2020, Hu et al., 2022, Yao et al., 2022, Xiao et al., 2023).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Sum of Ranked Range (SoRR) Loss.