Papers
Topics
Authors
Recent
2000 character limit reached

DORO: Distributional & Outlier Robust Optimization

Updated 15 December 2025
  • DORO is a novel optimization framework that blends distributional robustness with outlier resilience to maintain performance under data shifts and contaminated training data.
  • It employs dual representations of f-divergence and Wasserstein metrics, alongside metric learning, to construct data-driven ambiguity sets that filter out high-loss outliers.
  • Empirical results demonstrate DORO's superior worst-case accuracy and stability on benchmarks, including federated learning and heavy-tailed noise scenarios.

Distributional and Outlier Robust Optimization (DORO) encompasses a contemporary class of optimization frameworks designed to guarantee both distributional robustness (against test-time distribution shifts) and resilience to outlier contamination in the training data. DORO generalizes standard Distributionally Robust Optimization (DRO) by integrating mechanisms that explicitly mitigate the impact of arbitrary data contamination, deriving sharp excess risk and stability guarantees in settings characterized by subpopulation shift, adversarial outlier injection, federated data heterogeneity, and heavy-tailed noise. The formal apparatus of DORO builds on dual representations of ff-divergence and Wasserstein DRO, metric learning for data-driven ambiguity set construction, batch-wise ϵ\epsilon-truncation or outlier scoring, and convex–concave reformulations for tractable algorithm design. DORO offers theoretical rates and empirical improvements over both vanilla DRO and classical robust statistics.

1. Mathematical Foundations and Characterizations

The DORO paradigm arises from the limitations of pure DRO in the presence of outlier contamination—where classical ambiguity sets, such as ff-divergence balls or Wasserstein balls centered on empirical data, are excessively sensitive to corrupted samples. Standard DRO risk, given a distribution PP, loss (θ;z)\ell(\theta;z), and an ff-divergence D(QP)D(Q\|P) (or an optimal transport cost), is

RD,ρ(θ;P)=supQP{EQ[(θ;Z)]:D(QP)ρ}.R_{D,\rho}(\theta;P) = \sup_{Q\ll P} \left\{\mathbb{E}_{Q}[\ell(\theta;Z)]: D(Q\|P) \leq \rho\right\}.

This formulation, instantiated with Cressie–Read divergences, admits closed forms covering CVaR and χ2\chi^2-DRO: RDβ,ρ(θ;P)=infηR{cβ(ρ)EP[((θ;Z)η)+β]1/β+η}.R_{D_\beta,\rho}(\theta;P) = \inf_{\eta\in\mathbb{R}} \left\{c_\beta(\rho)\,\mathbb{E}_P[(\ell(\theta;Z)-\eta)_+^{\beta_*}]^{1/\beta_*} + \eta\right\}.

DORO refines this by adopting the Huber ϵ\epsilon-contamination model,

P^=(1ϵ)P+ϵP~,\widehat{P} = (1-\epsilon)P + \epsilon\widetilde{P},

and defines a lower envelope risk over all possible clean mixtures,

RD,ρ,ϵ(θ;P^)=infP{RD,ρ(θ;P):P~ s.t. P^=(1ϵ)P+ϵP~}.R_{D,\rho,\epsilon}(\theta;\widehat{P}) = \inf_{P'}\left\{R_{D,\rho}(\theta;P') : \exists\,\widetilde{P}' \text{ s.t. } \widehat{P} = (1-\epsilon)P' + \epsilon\widetilde{P}'\right\}.

For Cressie–Read divergences, this corresponds to solving the DRO objective after discarding the top ϵ\epsilon empirical losses. In the Wasserstein DRO regime, related frameworks combine geometric uncertainty (Wasserstein ball) and non-geometric contamination (total variation ball), yielding ambiguity sets

U(ρ,ϵ)={ν:Wpϵ(μ~,ν)ρ},\mathcal{U}(\rho,\epsilon) = \left\{\nu: \mathsf{W}_p^\epsilon(\tilde{\mu}, \nu) \le \rho\right\},

where Wpϵ\mathsf{W}_p^\epsilon is a partial optimal transport distance robust to an ϵ\epsilon-fraction of gross contamination.

2. Algorithmic Implementations and Procedures

DORO admits efficient algorithmic realization in both finite-sum large-scale and streaming/online settings. For Cressie–Read divergences, mini-batch stochastic gradient descent proceeds by discarding the ϵn\lfloor\epsilon n\rfloor largest per-sample losses in each batch, then solving the corresponding DRO dual (e.g., CVaR, χ2\chi^2-moment) over the uncontaminated fraction. The per-iteration cost is minimal, involving sorting and a scalar optimization for the dual variable.

For Wasserstein-based DORO, tractable dual reformulations yield empirical risk problems with regularization: minθ1ni=1n(θ;zi)+λϵR(θ),\min_\theta \frac1n\sum_{i=1}^n \ell(\theta;z_i) + \lambda_\epsilon \mathcal{R}(\theta), where λϵ\lambda_\epsilon is determined by the ambiguity set radius and R(θ)\mathcal{R}(\theta) is an explicit norm dependent on the metric (e.g., vector/matrix norm for regression/classification), with automatic outlier mitigation for suitably chosen transport costs.

In federated or heterogeneous data regimes, ambiguity sets can be expressed via unbalanced Wasserstein metrics augmented with Kullback–Leibler penalties. Algorithmic solutions use decentralized primal–dual updates, local/central aggregation of robust statistics, and inner maximization for outlier scoring via Lagrangian dualization—guaranteeing convergence at O(1/T)O(1/\sqrt{T}) or better under standard convexity and smoothness assumptions (Wang et al., 29 Sep 2025).

Table: DORO Algorithmic Procedures

Ambiguity Set Outlier Defense Representative Algorithm
Cressie–Read (ff-div) Top-ϵ\epsilon truncation Mini-batch SGD + ϵ\epsilon-truncation (Zhai et al., 2021)
Wasserstein-pp TV/partial OT, robust mean Dual convex program + robust mean (Nietert et al., 2023, Li et al., 14 Jul 2025)
Data-driven metric Learned Λ\Lambda penalty Stochastic soft-max gradient (Blanchet et al., 2017)
Unbalanced Wasserstein KL + OT penalty Decentralized saddle-point with inner maximization (Wang et al., 29 Sep 2025)

3. Theoretical Guarantees and Statistical Rates

DORO delivers robust excess risk bounds and optimal recovery rates under adversarial contamination. Under moment assumptions (e.g., bounded $2k$-th moment), DORO achieves:

  • For CVaR-DORO: O(α1σ2kϵ11/(2k))O(\alpha^{-1}\sigma_{2k} \epsilon^{1-1/(2k)}) estimation error (Zhai et al., 2021).
  • For χ2\chi^2-DORO: O(σ2kϵ1/21/(2k))O(\sigma_{2k} \epsilon^{1/2-1/(2k)}).
  • In Wasserstein DORO for bounded-covariance, risk excess O(Lip(ρ+dϵ))O(\| \ell_* \|_{Lip} ( \rho + \sqrt{d \epsilon} )) for p=1p=1 (Nietert et al., 2023), with matching lower bounds for minimax optimality.

For federated learning with unbalanced Wasserstein, the minimax residual scales as O(1/T+ϵ)O(1/\sqrt{T} + \epsilon) (Wang et al., 29 Sep 2025). Population-level guarantees are established for both worst-group and average-case risk, ensuring provable subpopulation fairness and superior generalization in high-variance, heavy-tailed, and contaminated environments.

4. Connections to Classical DRO, Robust Statistics, and Metric Learning

DORO unifies min--max (DRO/post-decision adversarial shift) and min--min (robust statistics/pre-decision contamination) frameworks. In the ϵ0\epsilon \rightarrow 0 limit, DORO recovers standard DRO; as ρ0\rho \rightarrow 0 it reduces to robust statistics with outlier correction (e.g., Huber/M-estimation, median-of-means). Convex conjugate duality and optimal transport theory yield explicit relations between regularizers in empirical risk minimization (e.g., LASSO, SVM, square-root LASSO) and the geometry of learned ambiguity sets (Blanchet et al., 2017, Blanchet et al., 26 Jan 2024). Data-driven metric learning—optimizing the ground cost in Wasserstein balls—enables DORO to adapt the contraction geometry to data manifold structure, emphasizing robust directions and suppressing adversarial influence of outlier-prone coordinates.

5. Empirical Performance across Regimes

Empirical evaluations across large real-world benchmarks (COMPAS, CelebA, CivilComments-Wilds, UCI datasets), under synthetic and real subpopulation shift, as well as federated learning, demonstrate that DORO outperforms vanilla DRO, classical robust statistics, and regularized ERM in worst-case accuracy, excess risk, and stability metrics (Zhai et al., 2021, Blanchet et al., 2017, Wang et al., 29 Sep 2025). On CelebA, for example, DORO with χ2\chi^2-DRO achieves worst-group accuracy improvements exceeding 5–15 points over standard DRO. DORO is shown to dampen instability in cross-epoch accuracy by 30–50%, and in federated settings, DORO-FL attains up to 20–30% accuracy gain versus group-weighted or adversarial-flattened baselines, maintaining robustness to substantial outlier fraction.

Table: Representative Benchmark Performance (CelebA)

Method Average Acc. Worst-group Acc. Std of Worst-group Acc.
ERM 95.01 53.94 8.59%
CVaR 82.83 66.44 11.53%
CVaR–DORO 92.91 72.17 11.53%
χ2\chi^2–DRO 83.85 67.76 8.88%
χ2\chi^2–DORO 82.18 68.33 19.06%

6. Extensions, Open Challenges, and Practical Guidance

DORO frameworks extend naturally to high-dimensional, nonconvex settings (using smooth approximations, robust filtering), reinforcement learning, and federated regimes. Key tuning parameters include ϵ\epsilon (outlier rate, typically set slightly above anticipated contamination) and ambiguity radii (α\alpha, ρ\rho), best calibrated by cross-validation on worst-case group performance. Data-driven cost learning requires careful metric-learning subroutines with hyperparameters tuned to class separation. Outlier scoring for federated DORO leverages domain-informed or soft-indicator penalties.

Open challenges persist in domain-agnostic model selection, downstream transfer to tasks absent group labels, and tractable extension to nonconvex and sequence models. High-probability robustness certificates and communication-efficient distributed algorithms are active areas for further theoretical development.

7. Summary and Significance

DORO provides a principled, scalable synthesis of distributional robustness and outlier resilience across supervised, structured, and federated learning regimes. It unifies min--max and min--min robust paradigms, yields exact and approximate reformulations for diverse loss functions and data geometries, and is theoretically optimal for a broad class of contamination models. Empirical results on modern benchmarks confirm substantial gains in worst-case risk, fairness, and stability over classical baselines. DORO is now foundational for robust machine learning under both adversarial and stochastic uncertainties, and serves as an interface between modern DRO, metric learning, and high-dimensional robust statistics (Zhai et al., 2021, Nietert et al., 2023, Wang et al., 29 Sep 2025, Blanchet et al., 2017, Blanchet et al., 26 Jan 2024, Li et al., 14 Jul 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to DORO (Distributional and Outlier Robust Optimization).