Sum of Ranked Range Loss
- SoRR loss is a rank-based aggregator that generalizes traditional loss functions by summing losses only over a designated ranked range, enhancing robustness.
- It applies a two-parameter (m, k) selection to sorted losses, effectively discarding extreme outliers and easy samples to target a mid-range of difficulty.
- Leveraging DC decomposition, SoRR supports efficient optimization via methods like DCA, Moreau smoothing, and proximal ADMM, with strong empirical performance.
The Sum of Ranked Range (SoRR) loss is a class of rank-based aggregators for supervised learning objectives that generalizes several standard loss aggregation paradigms. SoRR operates by applying a two-parameter (, ) selection to the sorted list of individual losses, summing only the losses ranked from the to largest. This approach enables robust, focused, and adaptive loss construction, discarding both extreme outliers and “easy” samples, and interpolating between empirical risk minimization, maximum loss, top- losses, trimmed means, and related objectives. SoRR can be expressed as the difference of convex (DC) functions and admits specialized optimization via DC algorithms and modern stochastic methods. It has found utility in aggregate sample-level objectives for classification (AoRR) and label-level objectives in multi-label and multi-class learning (TKML), with proven robustness and calibration properties across noise and outlier scenarios (Hu et al., 2021, Hu et al., 2020, Hu et al., 2022, Yao et al., 2022, Xiao et al., 2023).
1. Mathematical Definition and Ranked Range Selection
Let denote a collection of real-valued losses (or scores), typically arising from evaluating a model on samples or labels for one sample. Arrange in non-increasing order:
For integers , define the ranked range . The SoRR operator then aggregates these as:
When normalized, the corresponding average is termed AoRR:
Parameter choices interpolate between:
- Full average (, )
- Top- loss (, )
- Median ()
- Max loss (, )
- Trimmed mean (, )
By construction, SoRR discards the worst losses and the easiest, focusing training on a mid-range of difficulty. This selection scheme equips SoRR-based losses with enhanced robustness to sample and label outliers (Hu et al., 2021, Hu et al., 2022).
2. Structural Properties: Decomposition and Nonconvexity
SoRR is not convex in general. However, it is DC (difference of convex): the sum over any ranked range can be written as
where
is known to be convex in (for convex ). Each admits a dual representation (Hu et al., 2020, Yao et al., 2022):
Consequently, SoRR minimization is amenable to difference-of-convex optimization techniques, notably the DCA (Difference-of-Convex Algorithm), which alternates linearization of the concave term and convex minimization (Hu et al., 2021, Hu et al., 2020, Yao et al., 2022, Xiao et al., 2023).
3. Optimization Strategies: DCA, Smoothing, and ADMM
DC Optimization: DCA iterates by computing a subgradient of the “lower” convex term () and minimizing the “upper” convex term () penalized by this subgradient. The procedure is as follows:
- Compute
- Update
This can be implemented via stochastic subgradient descent, with inner updates for the threshold variables controlling the ranked range (Hu et al., 2021, Hu et al., 2020).
Moreau Smoothing & Stochastic Block-Coordinate Methods: For large-scale and deep learning scenarios, SoRR minimization has been enhanced by introducing smooth surrogates using Moreau envelopes for both convex components, yielding the complexity bound for -critical points (Yao et al., 2022). Proximal SGD with block-coordinate updates is employed to approximate the proximal maps associated with and .
Proximal ADMM Framework: A more recent unified optimization scheme embeds SoRR in a proximal ADMM framework. By introducing auxiliary variables and constraints, the SoRR-augmented objective is split into (sorted loss variable), (model parameter), and (dual variable) blocks. Each block is updated in closed form or via convex subproblems, with convergence rate per KKT residual (Xiao et al., 2023).
| Algorithm | Key Features | Complexity/Notes |
|---|---|---|
| DCA | DC structure, subgradients | Guaranteed monotonic descent |
| Moreau SGD | Smooth surrogates, block updates | accuracy |
| Proximal ADMM | Auxiliary splitting, PAVA | KKT residual |
4. SoRR-based Aggregators: AoRR and TKML
AoRR (Average of Ranked Range)
For training examples, AoRR defines:
AoRR generalizes classic metrics; by omitting up to worst-case losses, it confers robustness to sample corruption. This structure calibrates binary classification under mild surrogate conditions, recovering the Bayes rule in infinite-data limits. In the large-sample regime, AoRR approximates a difference of two CVaR risk functionals:
with nonasymptotic concentration bounds (Hu et al., 2021).
TKML (Top- Multi-Label Loss)
For multi-label (or multi-class) problems, TKML ranks per-label predictions and penalizes when true labels drop below rank . Formally, for , label set :
This is the -th largest margin-gap, as a -ranked range. TKML lower bounds the conventional margin loss and generalizes known top- consistent SVM losses (Hu et al., 2021, Hu et al., 2020).
Combined Loss (AoRR-TKML): For simultaneous robustness against sample and label-level outliers, TKML can be aggregated via AoRR, unifying approaches from iterative trimming and doubly stochastic mining (Hu et al., 2021).
5. Subgradient, Differentiability, and Clarke Regularity
SoRR is piecewise linear in its arguments. The subgradient with respect to an individual loss is:
For differentiable surrogates (e.g., logistic, hinge), the chain rule allows for efficient gradient propagation restricted to “in-range” samples. SoRR is continuous, locally Lipschitz, nearly everywhere differentiable, and possesses a well-defined Clarke subdifferential under mild monotonicity and convexity conditions (Hu et al., 2022, Xiao et al., 2023).
6. Empirical Evaluation and Computational Complexity
Empirical studies on synthetic and real-world datasets have established the robust performance gains of SoRR-based losses (Hu et al., 2021, Hu et al., 2020, Yao et al., 2022, Xiao et al., 2023):
- Aggregate-loss robustness: AoRR yields near-Bayes optimal boundaries in simulations with severe outliers and consistently outperforms average, max, and top- losses on UCI benchmarks with label noise.
- Multi-label ranking: TKML and TKML-AoRR gain several percentage points in top- accuracy and average precision on multi-label datasets, significantly outperforming one-vs-all logistic and standard rank-based objectives under asymmetric and symmetric label noise.
- Optimization cost: Main overhead is sorting per mini-batch (), or with selection + merge. ADMM/PAVA approaches enable efficient handling of large , and SoRR admits provable complexity bounds in SGD-based settings.
- Comparisons: SoRR markedly enhances robustness relative to average loss, max loss, and smooth surrogates, at negligible optimization overhead.
7. Extensions, Open Problems, and Applications
- Parameter selection: Choosing is application-dependent; setting equal to the outlier budget and tuning by validation yields stable improvement, but fully adaptive schemes remain an open problem.
- Nested/rank-based aggregators: Stacking SoRR within multi-label or further aggregating at sample/label levels is feasible; this suggests general unifying frameworks for loss construction.
- Statistical learning theory: Generalization guarantees and VC/Rademacher bounds for SoRR-aggregated objectives are under-explored.
- Optimization: Fast approximate and exact solvers (stochastic DCA, ADMM-PAVA) are under active development.
- Risk management connection: The link between SoRR and differences of CVaR terms motivates joint modeling with distributionally robust optimization.
SoRR loss enables flexible, robust, and interpretable objective design, supported by efficient optimization methodologies and strong empirical results in noisy, heavy-tailed, and imbalanced learning scenarios (Hu et al., 2021, Hu et al., 2020, Hu et al., 2022, Yao et al., 2022, Xiao et al., 2023).