Robust Location & Scatter Estimation

Updated 9 April 2026

Robust location and scatter estimation is a set of methods that accurately determine multivariate center and dispersion despite outliers.
Techniques like depth-based, S-/MM-, and M-estimators balance high breakdown, bounded influence, and computational efficiency.
Recent advances address challenges like cellwise contamination and scaling in high dimensions for reliable multivariate analysis.

Robust location and scatter estimation refers to a class of statistical procedures and estimators that provide accurate, stable, and interpretable estimates of multivariate location (center) and scatter (dispersion, covariance structure) in the presence of contamination or departures from model assumptions. The core focus is to attain high breakdown (resistance to outliers), bounded influence, and statistical efficiency in both moderate and high-dimensional regimes. The robust estimation landscape encompasses depth-based, S- and MM-estimation, minimum divergence approaches, $M$ -estimation (including Tyler’s and Maronna’s), as well as procedures specifically designed for cellwise contamination and computational scalability.

1. Definitions, Robustness Metrics, and Goals

Robust estimators of location $\mu\in\mathbb R^p$ and scatter $\Sigma\in S_{++}^p$ are designed to minimize the impact of gross errors or adversarial contamination in multivariate data. The classical sample mean and covariance matrix are optimal under Gaussianity but breakdown under even a single extreme outlier. Two key robustness metrics under the $\epsilon$ -contamination model

$\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$

are:

Breakdown Point: The smallest $\epsilon^*$ $ϵ^{*}$ such that the estimator diverges or becomes arbitrarily biased with contamination fraction $\epsilon\geq\epsilon^*$ $ϵ \geq ϵ^{*}$ .
- For a location functional $\hat\mu$ , asymptotic breakdown point is
$\epsilon_L^* = \inf\{\epsilon>0: \sup_{P\in\mathcal{P}_\epsilon(P_0)} \|\hat\mu(P)\| = \infty \}$ - For scatter, one tracks explosion (largest eigenvalue $\to\infty$ ) or implosion (smallest eigenvalue $\mu\in\mathbb R^p$ 0).
Maximum Bias Curve: For estimator $\mu\in\mathbb R^p$ 1 at nominal model $\mu\in\mathbb R^p$ 2, the maximum asymptotic bias as a function of $\mu\in\mathbb R^p$ 3 is

$\mu\in\mathbb R^p$ 4

These metrics are directly calculable for depth-based estimators (Adrover et al., 12 May 2025), $\mu\in\mathbb R^p$ 5-estimators (Maronna et al., 2015), minimum divergence functionals (Chakraborty et al., 2024), and weighted likelihood estimators (Agostinelli et al., 2017).

Robust estimators typically balance the following:

Affine equivariance (location and scatter transform appropriately under affine maps),
High breakdown (ideally up to 50%, e.g., MCD, S, MM, projection depth),
Bounded influence,
Statistical efficiency (small asymptotic mean squared error under the model),
Computational tractability.

2. Depth-Based Estimation: Tukey’s Median and Deepest Scatter

Statistical depth provides a geometric framework to generalize medians and quantiles to multivariate location and scatter:

Halfspace (Tukey) Depth for $\mu\in\mathbb R^p$ 6:

$\mu\in\mathbb R^p$ 7

The Tukey median $\mu\in\mathbb R^p$ 8 is the maximizer of depth and is affine-equivariant, Fisher-consistent, and has breakdown point $\mu\in\mathbb R^p$ 9 (Adrover et al., 12 May 2025):

$\Sigma\in S_{++}^p$ 0
Scatter Depth is defined over positive definite matrices $\Sigma\in S_{++}^p$ 1 as

$\Sigma\in S_{++}^p$ 2

The corresponding deepest scatter estimator maximizes this depth, is affine-equivariant, Fisher-consistent up to scale, and also achieves $\Sigma\in S_{++}^p$ 3 breakdown point. The explicit breakdown and maximal bias formulas for both the Tukey median and scatter estimators are provided in (Adrover et al., 12 May 2025).
Fast implementations replace minimum determinant subset search (as in MCD) with a one-shot calculation over the depth-trimmed region, as in the Fast Depth-Based (FDB) algorithm (Zhang et al., 2023). For projection depth, the breakdown is 50%.
Concentration Inequalities: Both the Tukey median and deepest scatter estimators achieve minimax optimal rates

$\Sigma\in S_{++}^p$ 4

under contamination $\Sigma\in S_{++}^p$ 5 ((Adrover et al., 12 May 2025), Theorem 4).

3. High-Breakdown Trimming, S-, MM-, and $\Sigma\in S_{++}^p$ 6-Estimators

The classical maximum breakdown estimators are the Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE), defined via subset selection to minimize determinant or volume over $\Sigma\in S_{++}^p$ 7 points. Their efficiency under Gaussianity is generally low (sub-60%) (Maronna et al., 2015). S-estimators and their extensions address robustness-efficiency trade-offs by minimizing robust scales of Mahalanobis distances,

$\Sigma\in S_{++}^p$ 8

with strictly bounded $\Sigma\in S_{++}^p$ 9 function (e.g., bisquare, Rocke). For MM-estimation, a high-breakdown S-step is followed by an M-step tuned for target efficiency (Maronna et al., 2015).

Rocke’s non-monotonic S-estimator employs a dimension-dependent weight function peaked at the normal shell, maintaining both high breakdown and fixed target efficiency up to large $\epsilon$ 0 (Maronna et al., 2015). Theoretical guidelines and practical tuning for MM and Rocke estimators with high-dimensional data are provided.

$\epsilon$ 1-estimators use two $\epsilon$ 2-functions, one for robustness and one for efficiency regulation (Maronna et al., 2015).

Componentwise DPD-based estimators decompose the high-dimensional problem into sequences of one- and two-dimensional minimum density power divergence problems, retaining breakdown and outlier resistance but scaling efficiently to high $\epsilon$ 3 (Chakraborty et al., 2024).

4. $\epsilon$ 4-Estimators, Precision Shrinkage, and Theoretical Foundations

$\epsilon$ 5-estimators for multivariate location and scatter, including Maronna’s and Tyler’s, are defined by iteratively reweighted equations: $\epsilon$ 6

with strictly decreasing, regular $\epsilon$ 7. The Gaussian MLE is recovered for $\epsilon$ 8, the $\epsilon$ 9-estimators for special tail-downweighting $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 0, and Tyler’s estimator for $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 1, the distribution-free case (Mériaux et al., 2019). The joint $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 2-estimators enjoy consistency, Fisher-consistency, and asymptotic normality, with explicit covariance expressions (Mériaux et al., 2019).

Precision structure shrinkage introduces a ridge penalty on the precision matrix to increase breakdown point arbitrarily near one in high dimension—outperforming classical $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 3-estimators in the presence of clustered or aligned outliers (Nikai et al., 16 Jan 2026). This estimator remains computationally efficient and orthogonally equivariant.

5. Emerging Directions: Cellwise Contamination, Filtering, and Computation

Traditional robust estimators (MCD, S, Stahel-Donoho, MM) fail under cellwise contamination—i.e., when a small fraction of individual cells is contaminated rather than entire rows. Two-step procedures have been developed (Agostinelli et al., 2014, Leung et al., 2016):

Step 1: Univariate and bivariate adaptive filtering to identify and snip cellwise outliers, replacing them with NA while using robust initial location/scale/variance estimates.
Step 2: Generalized S-estimation (GSE or GRE) on incomplete data; for high $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 4, generalized Rocke S-estimators (GRE) using non-monotonic Rocke weights ensure robustness up to $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 5, preserving up to 50% breakdown (Leung et al., 2016). Fast cluster-based subsampling is critical for initialization.

This paradigm is empirically superior under mixed cell/casewise contamination, and the combinatorial complexity is mitigated by efficient clustering and subsampling (as opposed to brute-force high-dimensional subset enumeration).

6. Weighted Likelihood, Quantile-Based, and Minimum Divergence Estimation

Weighted likelihood estimators (WLE) operate by downweighting observations based on discrepancy between the empirical and model Mahalanobis distance distributions. Weights are updated iteratively using bounded residual-adjustment functions, leading to nearly maximum likelihood efficiency in clean data and high breakdown under contamination (Agostinelli et al., 2017).

Quantile Least Squares (QLS) estimators for location and scale fit explicit linear regression models to trimmed sample quantiles, controlling robustness via quantile trimming; the influence functions are bounded due to exclusion of extreme quantiles, giving a direct breakdown-efficiency trade-off (Adjieteh et al., 2024).

Minimum density power divergence estimators (MDPDE) and sequential variants (SMDPDE) minimize divergence between the empirical sample and the model with a power parameter for robustness; the algorithmic splitting into coordinatewise and bivariate optimizations allows scalability and stability (Chakraborty et al., 2024).

7. Semiparametric and Complex-Valued Extensions

In the semiparametric complex elliptically symmetric (CES) setting, only the shape matrix (modulo scaling) and location are identifiable. Efficient estimation is achieved by combining Tyler’s $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 6-estimator for location with rank-based $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 7-estimation for shape, achieving the semiparametric Cramér–Rao lower bound in heavy-tailed or unspecified density scenarios (Fortunati et al., 2021).

The algorithmic approach combines initial robust estimation (Tyler), rank-based central sequences, and one-step corrections using bounded, distribution-free scores, providing practical algorithms with complexity $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 8 for $\mathcal{P}_\epsilon(P_0) = \{(1-\epsilon)P_0 + \epsilon Q: Q \text{ any distribution}\}$ 9-dimensional problems.