Reweighted Loss Functions in Machine Learning

Updated 13 May 2026

Reweighted loss functions are methods that assign nonnegative, often data-dependent, weights to individual loss components to generalize traditional empirical loss formulations.
They leverage techniques like iteratively reweighted schemes, adaptive updates, and distributionally robust optimization to balance multi-objective performance and address class imbalance.
Applications span sparse recovery, robust regression, adversarial training, calibration, and unlearning, delivering state-of-the-art performance with provable convergence and stability guarantees.

A reweighted loss function is a generalization of basic loss functions in optimization and machine learning, achieved by multiplying each term by a nonnegative, possibly data-dependent weight. This technique is foundational in sparse optimization, robust regression, adversarial training, calibration, unlearning, data-imbalance mitigation, and automated multi-objective balancing. Recent research shows that reweighting can be framed via convex surrogates, bilevel objectives, distributionally robust optimization, meta-learning, or axiomatic design, yielding state-of-the-art performance and provable theoretical guarantees across modalities.

1. Mathematical Foundations and Prototypical Forms

Let $\mathcal{L}(\theta)=\sum_{i=1}^n \ell_i(\theta)$ be a standard empirical loss, where $\ell_i(\theta)$ is the loss on sample $i$ . The reweighted loss takes the form

$\mathcal{L}_{\mathrm{rw}}(\theta) = \sum_{i=1}^n w_i\,\ell_i(\theta)$

where $w_i \geq 0$ are assigned weights. In multi-term or multi-task setups, components are indexed by $i$ and may also involve per-class, per-feature, or per-token indices. The assignment of weights $w_i$ is the principal axis that differentiates families of reweighting schemes:

Data geometry or loss curvature: As in iteratively reweighted $\ell_1$ -minimization for nonconvex sparsity (e.g., $w_i = p \left|x_i\right|^{p-1}$ for the $\ell_p$ quasi-norm, $\ell_i(\theta)$ 0) (Wang et al., 2024), or for robust regression via iteratively reweighted least squares (weights depending on residuals) (Dong et al., 2019).
Prediction margin or uncertainty: For example, the use of token-wise likelihoods or softmax confidence (e.g., as in focal loss weights $\ell_i(\theta)$ 1 or inverse-focal $\ell_i(\theta)$ 2) (Zhou et al., 29 May 2025).
Dynamic updates: Weights that adapt per iteration based on component loss statistics, rate-of-improvement, or historical dynamics (Heydari et al., 2019).
Decision sensitivity: In structured prediction, regret-based weights reflecting downstream decision errors (Lawless et al., 2022).
Distributional robustness: Weights derived as solutions to variational or min-max formulations, often computed as softmaxes of per-sample losses (Sow et al., 2023, Holtz et al., 2022).

In multi-objective or multi-part settings, one writes the composite loss as $\ell_i(\theta)$ 3 where each $\ell_i(\theta)$ 4 is a loss component or task (Heydari et al., 2019).

2. Representative Algorithms and Closed-Form Weight Schedules

Several core algorithmic templates have emerged:

Iteratively Reweighted Schemes (Sparsity/Robustness)

Sparse Regularization: For nonconvex sparse recovery using the $\ell_i(\theta)$ 5 quasi-norm $\ell_i(\theta)$ 6, the regularization term is iteratively linearized:

$\ell_i(\theta)$ 7

yielding at each step a convex optimization with weights updated as per the last iterate (Wang et al., 2024, Wang et al., 2021, Sun et al., 2022).

IRLS for Robust Regression: For robust loss function $\ell_i(\theta)$ 8, IRLS sets at each step

$\ell_i(\theta)$ 9

with $i$ 0 the residual at current parameters (Dong et al., 2019).

Adaptive and Learned Reweighting (Meta-Weighting)

SoftAdapt dynamically adjusts multiple loss terms based on short-term rate-of-change or loss magnitude, using a softmax function with a "temperature" parameter $i$ 1:

$i$ 2

where $i$ 3 is the finite difference or windowed trend of each loss component's history (Heydari et al., 2019).

Network-Parametric Reweighting: Importance is learned via a small neural architecture mapping per-sample multi-class margin or related statistics to the weight $i$ 4, optimized either via meta-gradient/MAML updates (Holtz et al., 2022) or other end-to-end approaches.

Distributional and Adversarially Robust Reweighting

Doubly Robust DRO: Instance weights are given by the KL-regularized softmax of adversarial losses:

$i$ 5

leading to a robust, log-sum-exp aggregate objective (Sow et al., 2023).

Inverse-View for Class Imbalance: Weights per class are explicitly solved to equalize per-class losses with a closed-form:

$i$ 6

with macro-level compensation for batch imbalance (Wang et al., 11 May 2026).

3. Applications in Modern Learning Paradigms

Reweighted losses underpin state-of-the-art practice across domains:

Sparsity and Robust Signal Recovery: Iteratively reweighted $i$ 7 (or $i$ 8) surrogates efficiently approximate nonconvex penalties or robust "M-estimator" losses, enabling efficient, globally convergent procedures in high-dimensional inverse problems (Wang et al., 2024, Wang et al., 2021, Sun et al., 2022, Dong et al., 2019).
Class Imbalance and Neural Collapse: Reweighting as an inverse problem, targeting equal per-class mean loss, eliminates the dominant obstacle to Equiangular Tight Frame (ETF) neural collapse geometry and delivers empirical gains in long-tailed classification (Wang et al., 11 May 2026).
Adversarial Robustness: Bilevel or variational reweighting, with theoretically grounded softmax weighting of per-example adversarial losses, leads to uniform class-wise robustness and improved worst-case error (Sow et al., 2023, Holtz et al., 2022).
Model Calibration and Selective Classification: Weighted risk minimization, with functional choices such as focal, inverse focal, or AURC weights, precisely tunes calibration properties, tying distinct reweighted losses to optimal confidence thresholding (Zhou et al., 29 May 2025).
Unlearning and Data Efficiency in LLMs: Token-level reweighting based on a blend of saturation (emphasizing hard-to-unlearn tokens) and importance (manually tagged tokens) achieves controlled, stable, and efficient unlearning in LLMs (Yang et al., 17 May 2025).
Automated Multi-Objective Balancing: Methods such as SoftAdapt allow neural networks with composite losses to autonomously focus gradient signal on the hardest or slowest-improving sub-objectives, bypassing brittle manual weighting (Heydari et al., 2019).
Super-Resolution and Dense Prediction: Trainable per-pixel (or per-sample) loss weights, learned by a convolutional network and constrained by architectural priors, lead to better visual fidelity in computer vision tasks (Mellatshahi et al., 2023).
Scientific Reweighting: Neural conditional reweighting avoids phase-space holes in high-energy physics by directly estimating conditional density ratios via a specialized classifier loss (Nachman et al., 2021).

4. Theoretical Guarantees and Optimization Properties

Reweighted loss methods often enjoy rigorous convergence and statistical properties:

Global Convergence and Local Rates: In SOIR- $i$ 9 and related methods, convergence to a stationary point is established under standard smoothness and invertibility, with local linear or quadratic rates under the Kurdyka–Łojasiewicz inequality (Wang et al., 2024, Wang et al., 2021, Sun et al., 2022).
Interpretability via Gradient Surrogates: Decision-aware reweighting links directly to finite-difference approximations of end-to-end task gradients, ensuring that weighted regression mimics the true-stochastic or decision-driven optimum (Lawless et al., 2022).
Distributional Robustness: KL-regularization of weights guarantees avoidance of pathological collapse and yields minimax bounds on instance-wise loss (Sow et al., 2023).
Calibration Consistency: Regularized AURC provides a differentiable, direct surrogate for selective-classification calibration, aligning loss minimization with calibration error objectives (Zhou et al., 29 May 2025).
Variance and Stability Control: Combined resampling and reweighting for SGD provably reduces stochastic variance, accelerates mean-square convergence, and allows increased step size stability (An et al., 2021).
Robustness to Outliers: Smooth, bounded-weight M-estimator losses, such as the $\mathcal{L}_{\mathrm{rw}}(\theta) = \sum_{i=1}^n w_i\,\ell_i(\theta)$ 0-loss, deliver both bounded influence and strict risk descent (Dong et al., 2019).

5. Comparative Analysis and Empirical Results

Practical studies across domains indicate clear performance gains from reweighting:

Domain	Scheme Type	Empirical Result Summary	Source
Sparse recovery	Iterative reweighted $\mathcal{L}_{\mathrm{rw}}(\theta) = \sum_{i=1}^n w_i\,\ell_i(\theta)$ 1	Faster convergence, fewer iterations vs. IRL2/IJT	(Wang et al., 2024, Wang et al., 2021)
Adversarial training	KL-DRO reweighting	+3–4% PGD-robust/tail class accuracy	(Sow et al., 2023)
Long-tailed learning	Inverse-view reweighting	+6.3–7.1% accuracy gains on CIFAR-100-LT	(Wang et al., 11 May 2026)
LLM unlearning	SatImp (saturation+importance)	67% improvement in retention vs GA baseline	(Yang et al., 17 May 2025)
Multi-objective DL	SoftAdapt	Up to 0.7 dB/0.06 SSIM gain, sharper reconstructions	(Heydari et al., 2019)
Kernel regression	IRLS with $\mathcal{L}_{\mathrm{rw}}(\theta) = \sum_{i=1}^n w_i\,\ell_i(\theta)$ 2 loss	Substantial robustness gain under outlier/noise	(Dong et al., 2019)
Super-resolution	Trainable loss weights	0.2–0.4 dB PSNR improvement, lower LPIPS	(Mellatshahi et al., 2023)
Calibration	Regularized AURC/inverse focal	Outperforms focal loss, lowest ECE/classwise ECE	(Zhou et al., 29 May 2025)

The significance of reweighting is thus broadly demonstrated: it addresses class imbalance, variance control, sample prioritization, calibration, robustness, and multi-objective optimization, often in a plug-and-play or theoretically grounded way.

6. Extensions, Limitations, and Design Principles

Key considerations and open directions include:

Choice of Weighting Function: Empirical results consistently show that soft, token-wise or per-sample weighting outperforms hard or batch-wise alternatives (Yang et al., 17 May 2025). Non-monotonic or improper weighting can destabilize convergence or produce pathological gradients (Zhou et al., 29 May 2025).
Computational Cost: Per-sample dynamic reweighting (e.g., meta-learned weights, iterative Newton updates) incurs overhead; amortization, approximate solvers, or functional regularizers may mitigate this.
Overfitting and Instability: Extremely sharp weighting (e.g., very high $\mathcal{L}_{\mathrm{rw}}(\theta) = \sum_{i=1}^n w_i\,\ell_i(\theta)$ 3 in softmax/softadapt, or focal loss with $\mathcal{L}_{\mathrm{rw}}(\theta) = \sum_{i=1}^n w_i\,\ell_i(\theta)$ 4) can completely neglect informative "easy" or low-loss samples, reducing generalization (Heydari et al., 2019, Zhou et al., 29 May 2025).
Axiomatic Approaches: Survey-based loss design (e.g., Cobb–Douglas form with explicit size-error trade-offs) compels principled, transparent, and domain-adaptive weighting (Coleman, 23 May 2025).
Structural Constraints: Fixed-sum, simplex, or prior-informed constraints on weights stabilize training and enforce desired balances (Mellatshahi et al., 2023, Wang et al., 11 May 2026, Sow et al., 2023).
Unified Theoretical Framing: Recent work synthesizing reweighted losses as variational bounds, gradient surrogates, or distributionally robust objectives has expanded their rational and practical scope (Shi et al., 24 Nov 2025, Lawless et al., 2022).