Median-of-Means Estimator

Updated 1 December 2025

The Median-of-Means estimator is a robust statistical method that partitions data into blocks, computes block means, and uses the median to estimate the population mean under heavy-tailed and contaminated conditions.
It achieves sub-Gaussian deviation rates and minimax optimal error bounds with finite variance, even in the presence of adversarial contamination.
Its computational efficiency and adaptability extend to high-dimensional, multivariate, and functional data, with successful applications in clustering, kernel methods, and robust U-statistics.

The Median-of-Means (MoM) estimator is a fundamental robust statistical tool designed to achieve minimax optimality in mean estimation under minimal moment assumptions, adversarial contamination, and heavy tails. The estimator has led to theoretical insights, algorithmic advances, and applications spanning robust statistics, high-dimensional inference, learning theory, clustering, density estimation, and quantum tomography.

1. Definition and Core Principles

Let $X_1,\dots,X_n$ be independent real-valued random variables with mean $\mu$ (possibly unknown) and variance $\sigma^2 < \infty$ . For an integer $k \leq n$ , partition the sample indices into $k$ disjoint blocks $I_1, \dots, I_k$ of (approximately) equal size $m = \lfloor n/k \rfloor$ . Compute the block means: $\bar{X}_i = \frac{1}{m} \sum_{j \in I_i} X_j, \quad i=1,\dots,k.$ The Median-of-Means estimator is defined as the median of the $k$ block means: $\widehat{\mu}_{\mathrm{MoM}} = \mathrm{median}\{\bar{X}_1, \dots, \bar{X}_k\}.$ For even $k$ , the median may be defined as the average of the two central values. The breakdown point of the MoM is approximately $1/2$: as long as strictly fewer than $k/2$ blocks are fully corrupted, the estimator’s value remains controlled by the majority of uncontaminated blocks (Juan et al., 9 Oct 2025, Tu et al., 2021).

This construction extends to vector-valued and functional data by applying the scalar MoM to projections onto the extreme points of the dual unit ball of the norm of interest (Lugosi et al., 2018, Wang et al., 5 Sep 2024), and to kernel mean embeddings via the geometric median in Hilbert space (Lerasle et al., 2018).

2. Robustness, Error Bounds, and Optimality Under Contamination

The cornerstone property of MoM is that, under only finite variance, it achieves sub-Gaussian deviation rates and minimax-optimal estimation under adversarial $\epsilon$ -contamination.

Adversarial Contamination Model: The sample consists of $n$ observations, with up to $\lfloor\epsilon n\rfloor$ replaced arbitrarily by an adversary; the remaining are i.i.d. with mean $\mu$ and variance $\sigma^2$ (Juan et al., 9 Oct 2025, Laforgue et al., 2020).

Main Results

Finite-variance distributions ( $\mathcal{P}_2$ ):

With suitable $k \geq \max\{\lceil \log(2/\delta)/(1/2 - 1/\gamma)^2 \rceil, \lceil \gamma \epsilon n \rceil\}$ for $\gamma \in (2,2.5]$ , $\epsilon \leq 0.4$ , with probability at least $1-\delta$ ,

$|\widehat\mu_{\mathrm{MoM}} - \mu| \leq C(\gamma)\sigma \left[ \sqrt{\frac{\log(2/\delta)}{n}} + \sqrt{\epsilon} \right].$

Matching minimax lower bounds show this is optimal: no estimator can achieve better than order $\Theta(\sqrt{\epsilon})$ bias in this regime (Juan et al., 9 Oct 2025).

Infinite-variance but finite $(1+r)$ -th moment ( $\mathcal{P}_{1+r}$ , $r \in (0,1)$ ):

$|\widehat\mu_{\mathrm{MoM}} - \mu| \leq C v_r^{1/(1+r)} \left[ (\log(2/\delta)/n)^{r/(1+r)} + \epsilon^{r/(1+r)} \right],$

where $v_r = \mathbb{E}|X-\mu|^{1+r}$ (Juan et al., 9 Oct 2025).

Light-tailed (subexponential, sub-Gaussian):

MoM does not achieve the information-theoretic lower bound for bias; it incurs a $\Theta(\epsilon^{2/3})$ maximum bias, suboptimal compared to the best $O(\epsilon \sqrt{\log(1/\epsilon)})$ attainable by the trimmed mean (Juan et al., 9 Oct 2025).

Additional Key Properties:

MoM can tolerate up to $\approx 40\%$ contamination in the finite-variance regime.
Requires only splitting and median calculation; computationally efficient, trivially parallelizable.
For data drawn from symmetric distributions, MoM can recover the optimal $O(\epsilon)$ bias rate, e.g., for Gaussian and symmetric stable laws (Juan et al., 9 Oct 2025).

Extension to Multivariate, General Norms, and Beyond

The MoM construction extends to $\mathbb{R}^d$ and to arbitrary norms via the uniform median-of-means estimator, where for each $x^*$ in the extreme points of the dual norm ball,

$\mathrm{Med}\left\{ x^*(\bar{X}_1), \ldots, x^*(\bar{X}_k) \right\}$

defines a family of slabs, whose intersection is a confidence polytope containing the mean with high probability. The diameter of this set informs the estimator's accuracy. The uniform MoM achieves oracle rates driven by the Gaussian mean width and worst-case variance across directions (Lugosi et al., 2018, Wang et al., 5 Sep 2024).

For general norms and heavy-tailed regimes, MoM's analysis replaces Rademacher complexities with VC-dimension arguments to obtain moment- and contamination-robust bounds: $\|\widehat\mu - \mu\| \lesssim \|\Sigma^{1/2}\| \left[ \sqrt{\frac{\mathrm{VC}}{n}} + \sqrt{\frac{\log(1/\delta)}{n}} + \sqrt{\epsilon} \right].$ (Wang et al., 5 Sep 2024).

Geometric-median-of-means estimators further extend robustness in $\mathbb{R}^d$ , achieving sub-Gaussian concentration and dimension-independent control under appropriate small-ball and negative-moment conditions (Minsker et al., 2023).

3. Extensions: General Function Classes, U-statistics, and Kernel Methods

Function Classes and ERM: Uniform MoM enables simultaneous robust mean estimation across a (possibly infinite) function class $\mathcal{F}$ by applying MoM blockings to each $f \in \mathcal{F}$ and controlling the complexity via discretization or pseudodimension, yielding uniform $O(\varepsilon)$ accuracy at sample size

$n \asymp (v_p/\varepsilon^p)^{1/(p-1)} [\log N_\mathcal{F} + \log(1/\delta)]$

under $L_p$ -moment bounds, $p\in(1,2]$ (Høgsgaard et al., 17 Jun 2025).

Robust U-statistics: The MoM principle extends to $U$ -statistics by computing decoupled block-wise $U$ -statistics and taking their median. Under only finite variance (or $L_p$ moments), the MoM $U$ -statistic matches the oracle rates for bounded kernels and remains robust to outliers (Joly et al., 2015, Laforgue et al., 2020). For canonical, symmetric kernels $h$ , the MoM $U$ -statistic converges at order $(\log(1/\delta)/n)^{m/2}$ for degree $m$ , with explicit constants.

Kernel Mean Embedding: The MoM framework generalizes to Hilbert spaces. Block-wise mean feature vectors are computed, and the geometric median yields a robust kernel mean embedding. This achieves sub-Gaussian deviation and robust maximum mean discrepancy (MMD) estimation under trace-class kernel covariance operators with breakdown point $\approx 25\%$ (Lerasle et al., 2018).

4. Applications: Learning Theory, Clustering, Density Estimation

MoM-based estimators have become foundational in robust learning and unsupervised learning.

Robust Empirical Risk Minimization (ERM):

The MoM loss can be used in ERM and regularized ERM, conferring high resistance to label or covariate contamination in supervised learning (Lecué et al., 2017).
In high-dimensional regression (e.g., MOM-LASSO), MoM enables oracle-optimal estimation and variable selection under adversarial corruption, with statistical and computational guarantees matching non-contaminated minimax rates.

Clustering:

Integrated into convex clustering frameworks, as in COMET, or nonparametric Dirichlet Process-MoM clustering, MoM confers resistance to outliers and prevents cluster fragmentation or collapse (De et al., 12 Nov 2025, Basu et al., 2023).
These methods achieve weak consistency and near-oracle convergence rates, and empirically outperform $k$ -means, convex clustering, and other robust clusterers under heavy contamination (De et al., 12 Nov 2025, Basu et al., 2023, Høgsgaard et al., 17 Jun 2025).
MoM can robustify the empirical risk in $k$ -means, DP-means, or model-based Bayesian nonparametrics for both loss evaluation and cluster-number selection, yielding reliable detection of noise clusters.

Kernel Density Estimation:

The MoM-KDE computes classical kernel density estimators on each block and returns the pointwise median, ensuring asymptotic minimax rates and pointwise $O(n^{-1/2})$ deviation with robustness to arbitrarily heavy-tailed contamination (Humbert et al., 2020).

Quasi-Monte Carlo and Quantum Tomography:

Median-of-means estimators applied to linearly scrambled digital nets yield dimension-independent convergence for high-dimensional integration, under strong tractability conditions (Pan, 20 May 2025).
In classical shadows estimation for quantum observables, MoM—and particularly efficient U-statistic-based variants—optimally trade off shot complexity, variance, and $\delta$ -confidence under only finite variance (Fu et al., 4 Dec 2024).

5. Algorithmic Implementations, Practical Choices, and Variants

Algorithmic Workflow

Partition dataset into $k$ (scalar case) or $K$ (multivariate) disjoint blocks.
For each block, compute mean (or, in extensions, more complex functionals: blockwise risk, kernel mean, blockwise $U$ -statistic, etc.).
Aggregate using the median (coordinatewise, geometric, or over all blocks, as appropriate).

For high-dimensional data or regression, the MoM principle is applied to directional projections (extremal functionals for general norms), or to function classes by discretizing or covering via VC-dimension or pseudodimension (Lugosi et al., 2018, Wang et al., 5 Sep 2024, Høgsgaard et al., 17 Jun 2025).

Distributed and Byzantine-robust Settings: MoM and its variance-reduced variants can be deployed in distributed architectures, providing resistance to Byzantine (arbitrarily corrupted) nodes. Practical implementations show that the variance-reduced MoM achieves efficiency $\approx0.95$ relative to the Cramér-Rao bound while tolerating up to $50\%$ corruption (Tu et al., 2021).

Block Number and Tuning: The number of blocks $k$ is central—large $k$ favors robustness to contamination/heavy tails, small $k$ favors efficiency for light-tailed/lower-variance data. Data-driven or two-stage selection (MoMoM) strategies and cross-validation are frequently recommended (Juan et al., 9 Oct 2025, Lecué et al., 2017).

Variants and Enhancements

Geometric Median-of-Means: Geometric median provides robustness when the mean is not well-defined, or in non-Euclidean (and RKHS) settings (Lerasle et al., 2018, Minsker et al., 2023).
Efficient and U-statistic-based Variants: Overlapping, data-symmetrized or permutation-invariant blockings deliver improved constants and relax moment requirements; random incomplete U-statistics enable scalable approximation and tighter deviation inequalities (Minsker, 2023, Fu et al., 4 Dec 2024).
Bayesian Median-of-Means: Interpolates between the mean and the median via Dirichlet reweighting, and achieves lower variance with small bias (asymptotically negligible) (Orenstein, 2019).

6. Limitations and Future Directions

The MoM estimator's structure—single tunable $k$ , median-of-blocks—creates trade-offs: optimality for heavy tails may come at the expense of suboptimality for light-tailed data. No single choice of $k$ is uniformly optimal over all distributional regimes; data-adaptive tuning, hybridization with trimmed/Catoni estimators, and multi-stage/recursive median-of-means remain active topics (Juan et al., 9 Oct 2025).

MoM does not, in general, achieve minimax-optimal rates for all light-tailed models unless symmetry assumptions are imposed. Further research involves designing estimators that interpolate adaptively between regimes, leveraging empirical information on tail behavior and symmetry (Juan et al., 9 Oct 2025, Minsker, 2023).

Open problems include:

Extension to structured contamination models with dependent and heteroscedastic outliers.
Minimax analysis in infinite-dimensional settings (e.g., functional data, operator-valued mean estimation).
Fully data-driven, computationally efficient choices for $k$ .
Improved nonasymptotic analysis for incomplete U-statistic median-of-means estimators (Fu et al., 4 Dec 2024, Minsker, 2023).

7. Comparative Summary

Estimator	Heavy Tails (finite variance)	Light Tails (sub-Gaussian)	Outlier Robustness	Moment Assumption
Sample Mean	Unreliable	Optimal	Breakdown $1/n$	Finite variance
Trimmed Mean	Optimal ( $O(\sqrt\epsilon)$ )	Optimal ( $O(\epsilon \sqrt{\log(1/\epsilon)})$ )	Robust to large fraction (proportional to trim fraction)	Higher moments for optimality
Catoni's M	Optimal ( $O(\sqrt\epsilon)$ )	Optimal or suboptimal	Robust with tuning parameter	Requires known moment $r$
MoM	Optimal ( $O(\sqrt\epsilon)$ )	Suboptimal ( $O(\epsilon^{2/3})$ unless symmetry)	Breakdown $\approx 1/2$	Finite variance
Symmetric MoM	Optimal ( $O(\epsilon)$ )	Optimal ( $O(\epsilon)$ )	Breakdown $\approx 1/2$	Finite variance + symmetry

MoM is unique in achieving minimax optimality for adversarial contamination in the heavy-tailed regime, and remains computationally efficient and widely applicable, but is not always optimal for light-tailed scenarios without further modification (Juan et al., 9 Oct 2025, Lecué et al., 2017, Wang et al., 5 Sep 2024).

References:

"On the Optimality of the Median-of-Means Estimator under Adversarial Contamination" (Juan et al., 9 Oct 2025)
"Convex Clustering Redefined: Robust Learning with the Median of Means Estimator" (De et al., 12 Nov 2025)
"Error bounds of Median-of-means estimators with VC-dimension" (Wang et al., 5 Sep 2024)
"Near-optimal mean estimators with respect to general norms" (Lugosi et al., 2018)
"Robust machine learning by median-of-means: theory and practice" (Lecué et al., 2017)
"Efficient median of means estimator" (Minsker, 2023)
"The Geometric Median and Applications to Robust Mean Estimation" (Minsker et al., 2023)