Papers
Topics
Authors
Recent
2000 character limit reached

Robust Deep ES Estimator

Updated 18 November 2025
  • The paper introduces a two-stage deep learning framework that orthogonalizes quantile and expected shortfall estimation using deep quantile regression and Huber loss.
  • It achieves non-asymptotic tail robustness with provable error bounds, effectively mitigating the influence of heavy-tailed residuals.
  • Empirical studies demonstrate improved prediction accuracy in high-dimensional settings, especially under heavy-tailed noise in environmental applications.

A Robust Deep ES (Expected Shortfall) Estimator in the context of modern machine learning refers to a deep neural methodology for estimating the conditional tail risk of a target variable, designed with explicit robustness to heavy-tailed response distributions and model misspecification. This estimator operates in high-dimensional, nonparametric settings via hierarchical architectures, orthogonalizing the estimation of quantile and expected shortfall functions, and incorporates robustification techniques such as the Huber loss to achieve non-asymptotic resistance to outliers and model noise (Yu et al., 11 Nov 2025).

1. Mathematical Formulation of Expected Shortfall Regression

Let YY be a real-valued response variable with cumulative distribution function FYF_Y. The Value-at-Risk (VaR) at level α(0,1)\alpha\in(0,1) is qα(Y):=inf{y:FY(y)α}q_\alpha(Y):= \inf\{y: F_Y(y)\geq\alpha\}, and the Expected Shortfall (ES, also known as Conditional Value-at-Risk) at level α\alpha is

eα(Y):=E[YYqα(Y)]=1αE[Y1{Yqα(Y)}].e_\alpha(Y) := \mathbb{E}\left[ Y \mid Y \leq q_\alpha(Y) \right] = \frac{1}{\alpha} \mathbb{E}\left[ Y \mathbb{1}\{Y \leq q_\alpha(Y)\} \right].

For regression with covariates XRdX\in\mathbb{R}^d, nonparametric functions f0(x):=qα(YX ⁣= ⁣x)f_0(x):=q_\alpha(Y\mid X\!=\!x) and g0(x):=eα(YX ⁣= ⁣x)g_0(x):=e_\alpha(Y\mid X\!=\!x) represent the conditional quantile and ES, respectively. Since ES cannot be directly elicited, a robust deep ES estimator employs a "two-step orthogonalization framework": first estimate f0f_0 (conditional quantile) using deep quantile regression (DQR), then estimate g0g_0 based on the residuals, treating f0f_0 as a nuisance parameter (Yu et al., 11 Nov 2025).

2. Algorithmic Structure: Two-Step Deep Robust ES Estimation

The robust deep ES estimator is built as follows:

Stage 1—Deep Quantile Regression (DQR):

  • Fit a class Fn\mathcal{F}_n of truncated, fully-connected ReLU networks to minimize the empirical check loss

Q^α(f)=1ni=1nρα(Yif(Xi)),\widehat{Q}_\alpha(f) = \frac{1}{n}\sum_{i=1}^n \rho_\alpha\left(Y_i - f(X_i)\right),

where ρα(u)=(α1{u<0})u\rho_\alpha(u)=(\alpha-\mathbb{1}\{u<0\})u.

Stage 2—Deep Robust ES (DRES):

  • For each candidate ff, compute surrogate responses Zi(f)=min{Yif(Xi),0}+αf(Xi)Z_i(f) = \min\{Y_i-f(X_i),0\} + \alpha f(X_i).
  • Fit a class Gn\mathcal{G}_n of truncated, fully-connected ReLU networks for gg by minimizing the average Huber loss:

g^n,τargmingGn1ni=1nτ(Zi(f^n)αg(Xi)),\hat{g}_{n,\tau} \in \arg\min_{g\in \mathcal{G}_n} \frac{1}{n}\sum_{i=1}^n \ell_\tau\left(Z_i(\hat{f}_n) - \alpha g(X_i)\right),

where τ(u)\ell_\tau(u) is the Huber loss with parameter τ\tau.

The role of the Huber loss is to introduce robustness against heavy-tailed residuals Zi(f)Z_i(f), crucial since classical squared-error metrics do not handle outliers gracefully in the tails, which are the focus of expected shortfall (Yu et al., 11 Nov 2025).

3. Statistical Theory and Robustness Guarantees

The robust deep ES estimator achieves provable non-asymptotic tail robustness. Let ϵ=Yf0(X)\epsilon = Y - f_0(X); the key technical condition is finite pp-th moment of ϵ=min(ϵ,0)\epsilon_- = \min(\epsilon,0), i.e., E[ϵE(ϵX)p]<\mathbb{E}[|\epsilon_- - \mathbb{E}(\epsilon_-|X)|^p]<\infty for some p>1p>1. The DRES estimator then satisfies, with high probability,

g^n,τg02C[ηs+ηb+ηa+δs+δ42+νp1/p+τn]/α,\|\hat{g}_{n,\tau} - g_0\|_2 \leq C\left[\eta_s + \eta_b + \eta_a + \delta_s + \delta_4^2 + \frac{\nu_p^{1/p}+\sqrt{\tau}}{\sqrt{n}}\right]/\alpha,

where

  • ηs\eta_s is the stochastic error,
  • ηb=O(νpτ1p)\eta_b = O(\nu_p \tau^{1-p}) is the bias from Huber truncation,
  • ηa\eta_a is the approximation error from the ReLU network class,
  • δs\delta_s reflects estimation error scaling as O((LN)dlog(dLN)lognn)O((LN)\sqrt{\frac{d\log(dLN)\log n}{n}}) for network depth LL and width NN,
  • δ4=f^nf04=Op(nγ/(2γ+1))\delta_4 = \|\hat{f}_n-f_0\|_4 = O_p(n^{-\gamma^*/(2\gamma^*+1)}), with γ\gamma^* determined by the hierarchical compositional structure assumed of g0g_0.

For sub-Gaussian errors (ϵ\epsilon_- light-tailed), DRES matches the efficiency of deep least squares (DES) approaches; for heavy tails, DRES outperforms DES due to reduced sensitivity to outliers (Yu et al., 11 Nov 2025).

4. Neural Network Architecture and Curse-of-Dimensionality Mitigation

The estimator leverages hierarchical composition models H(d,,M0,P)\mathcal{H}(d,\ell,M_0,\mathcal{P}) where f0f_0 and g0g_0 are compositions of low-rank Hölder-smooth functions, enabling the use of deep ReLU networks of moderate size to overcome the curse of dimensionality. Networks are organized with sufficient depth LL and width NN such that the L2L_2-approximation error admits

O((L0N0)2γ),O\left((L_0 N_0)^{-2\gamma^*}\right),

with γ=min(β,t)Pβ/t\gamma^* = \min_{(\beta,t)\in\mathcal{P}} \beta/t determined by layers’ smoothness and interaction order (Yu et al., 11 Nov 2025).

5. Empirical Performance and Case Studies

Simulation studies in d=8d=8 dimensions (sample size n=4096n=4096) show that DRES achieves near-oracle mean squared prediction error for both light-tailed (Gaussian) and heavy-tailed (t2.25/3t_{2.25}/3) noise, outperforming local linear ES (LLES) and non-robust DES in the latter regime. Under heavy tails, DRES exhibits dramatically improved accuracy and monotonicity enforcement when combined with non-crossing regularization.

In an environmental science application, DRES estimated upper-tail ES (α=0.99\alpha=0.99) for monthly precipitation conditional on El Niño indices and spatial-temporal covariates. Robust ES inference revealed spatial teleconnections better than mean-based analysis, e.g., mapping increased risk of extreme rainfall in southern California and the Gulf Coast. Variable importance metrics confirmed key covariates (longitude, latitude, Niño index) for tail event prediction (Yu et al., 11 Nov 2025).

6. Algorithmic Implementation and Practical Considerations

  • Input data: {(Xi,Yi)}i=1n\{(X_i,Y_i)\}_{i=1}^n, quantile level α\alpha, network hyperparameters, Huber parameter τ\tau.
  • Train DQR to estimate f0f_0.
  • Compute Zi(f^n)Z_i(\hat{f}_n) and fit the DRES network for gg using Huber loss.
  • For multiple α\alpha values, enforce monotonicity of joint quantile/ES outputs if needed.
  • The choice of τ\tau requires balancing bias (too high τ\tau) and sensitivity to outliers (too low τ\tau), with theoretical guidance for scaling with sample size.

A plausible implication is that the two-stage network plus Huber robustification pipeline constitutes a best-practice route for ES estimation when signal structure is compositional and errors are non-sub-Gaussian.

7. Relationship to Other Robust Deep Estimation Frameworks

Robust Deep ES Estimation is distinct from both deep energy-score estimators (Saremi et al., 2018) and robust deep likelihood-based maximum likelihood estimators (such as DeepMLE (Xiao et al., 2022));

  • The former addresses unsupervised density/scoring function estimation, not supervised tail risk.
  • DeepMLE (Xiao et al., 2022) employs mixture models and explicit uncertainty prediction for geometric vision tasks, emphasizing Gaussian-uniform mixture robustness at pixel-level, while robust deep ES regression addresses tail conditional functionals with respect to covariate distributions.

The robust deep ES estimator also contrasts with black-box evolutionary strategies, which optimize noise-averaged objectives for parameter robustness (Lehman et al., 2017, Meier et al., 2019). Instead of searching the parameter space for perturbation-invariant optima, the DRES mathematically targets conditional tail means, robust to heavy-tailed responses by direct construction and with formal statistical guarantees (Yu et al., 11 Nov 2025).


References

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Robust Deep ES Estimator.