Papers
Topics
Authors
Recent
2000 character limit reached

Median-of-Means Estimator

Updated 1 December 2025
  • The Median-of-Means estimator is a robust statistical method that partitions data into blocks, computes block means, and uses the median to estimate the population mean under heavy-tailed and contaminated conditions.
  • It achieves sub-Gaussian deviation rates and minimax optimal error bounds with finite variance, even in the presence of adversarial contamination.
  • Its computational efficiency and adaptability extend to high-dimensional, multivariate, and functional data, with successful applications in clustering, kernel methods, and robust U-statistics.

The Median-of-Means (MoM) estimator is a fundamental robust statistical tool designed to achieve minimax optimality in mean estimation under minimal moment assumptions, adversarial contamination, and heavy tails. The estimator has led to theoretical insights, algorithmic advances, and applications spanning robust statistics, high-dimensional inference, learning theory, clustering, density estimation, and quantum tomography.

1. Definition and Core Principles

Let X1,,XnX_1,\dots,X_n be independent real-valued random variables with mean μ\mu (possibly unknown) and variance σ2<\sigma^2 < \infty. For an integer knk \leq n, partition the sample indices into kk disjoint blocks I1,,IkI_1, \dots, I_k of (approximately) equal size m=n/km = \lfloor n/k \rfloor. Compute the block means: Xˉi=1mjIiXj,i=1,,k.\bar{X}_i = \frac{1}{m} \sum_{j \in I_i} X_j, \quad i=1,\dots,k. The Median-of-Means estimator is defined as the median of the kk block means: μ^MoM=median{Xˉ1,,Xˉk}.\widehat{\mu}_{\mathrm{MoM}} = \mathrm{median}\{\bar{X}_1, \dots, \bar{X}_k\}. For even kk, the median may be defined as the average of the two central values. The breakdown point of the MoM is approximately $1/2$: as long as strictly fewer than k/2k/2 blocks are fully corrupted, the estimator’s value remains controlled by the majority of uncontaminated blocks (Juan et al., 9 Oct 2025, Tu et al., 2021).

This construction extends to vector-valued and functional data by applying the scalar MoM to projections onto the extreme points of the dual unit ball of the norm of interest (Lugosi et al., 2018, Wang et al., 5 Sep 2024), and to kernel mean embeddings via the geometric median in Hilbert space (Lerasle et al., 2018).

2. Robustness, Error Bounds, and Optimality Under Contamination

The cornerstone property of MoM is that, under only finite variance, it achieves sub-Gaussian deviation rates and minimax-optimal estimation under adversarial ϵ\epsilon-contamination.

Adversarial Contamination Model: The sample consists of nn observations, with up to ϵn\lfloor\epsilon n\rfloor replaced arbitrarily by an adversary; the remaining are i.i.d. with mean μ\mu and variance σ2\sigma^2 (Juan et al., 9 Oct 2025, Laforgue et al., 2020).

Main Results

Finite-variance distributions (P2\mathcal{P}_2):

  • With suitable kmax{log(2/δ)/(1/21/γ)2,γϵn}k \geq \max\{\lceil \log(2/\delta)/(1/2 - 1/\gamma)^2 \rceil, \lceil \gamma \epsilon n \rceil\} for γ(2,2.5]\gamma \in (2,2.5], ϵ0.4\epsilon \leq 0.4, with probability at least 1δ1-\delta,

μ^MoMμC(γ)σ[log(2/δ)n+ϵ].|\widehat\mu_{\mathrm{MoM}} - \mu| \leq C(\gamma)\sigma \left[ \sqrt{\frac{\log(2/\delta)}{n}} + \sqrt{\epsilon} \right].

Matching minimax lower bounds show this is optimal: no estimator can achieve better than order Θ(ϵ)\Theta(\sqrt{\epsilon}) bias in this regime (Juan et al., 9 Oct 2025).

Infinite-variance but finite (1+r)(1+r)-th moment (P1+r\mathcal{P}_{1+r}, r(0,1)r \in (0,1)):

μ^MoMμCvr1/(1+r)[(log(2/δ)/n)r/(1+r)+ϵr/(1+r)],|\widehat\mu_{\mathrm{MoM}} - \mu| \leq C v_r^{1/(1+r)} \left[ (\log(2/\delta)/n)^{r/(1+r)} + \epsilon^{r/(1+r)} \right],

where vr=EXμ1+rv_r = \mathbb{E}|X-\mu|^{1+r} (Juan et al., 9 Oct 2025).

Light-tailed (subexponential, sub-Gaussian):

MoM does not achieve the information-theoretic lower bound for bias; it incurs a Θ(ϵ2/3)\Theta(\epsilon^{2/3}) maximum bias, suboptimal compared to the best O(ϵlog(1/ϵ))O(\epsilon \sqrt{\log(1/\epsilon)}) attainable by the trimmed mean (Juan et al., 9 Oct 2025).

Additional Key Properties:

  • MoM can tolerate up to 40%\approx 40\% contamination in the finite-variance regime.
  • Requires only splitting and median calculation; computationally efficient, trivially parallelizable.
  • For data drawn from symmetric distributions, MoM can recover the optimal O(ϵ)O(\epsilon) bias rate, e.g., for Gaussian and symmetric stable laws (Juan et al., 9 Oct 2025).

Extension to Multivariate, General Norms, and Beyond

The MoM construction extends to Rd\mathbb{R}^d and to arbitrary norms via the uniform median-of-means estimator, where for each xx^* in the extreme points of the dual norm ball,

Med{x(Xˉ1),,x(Xˉk)}\mathrm{Med}\left\{ x^*(\bar{X}_1), \ldots, x^*(\bar{X}_k) \right\}

defines a family of slabs, whose intersection is a confidence polytope containing the mean with high probability. The diameter of this set informs the estimator's accuracy. The uniform MoM achieves oracle rates driven by the Gaussian mean width and worst-case variance across directions (Lugosi et al., 2018, Wang et al., 5 Sep 2024).

For general norms and heavy-tailed regimes, MoM's analysis replaces Rademacher complexities with VC-dimension arguments to obtain moment- and contamination-robust bounds: μ^μΣ1/2[VCn+log(1/δ)n+ϵ].\|\widehat\mu - \mu\| \lesssim \|\Sigma^{1/2}\| \left[ \sqrt{\frac{\mathrm{VC}}{n}} + \sqrt{\frac{\log(1/\delta)}{n}} + \sqrt{\epsilon} \right]. (Wang et al., 5 Sep 2024).

Geometric-median-of-means estimators further extend robustness in Rd\mathbb{R}^d, achieving sub-Gaussian concentration and dimension-independent control under appropriate small-ball and negative-moment conditions (Minsker et al., 2023).

3. Extensions: General Function Classes, U-statistics, and Kernel Methods

Function Classes and ERM: Uniform MoM enables simultaneous robust mean estimation across a (possibly infinite) function class F\mathcal{F} by applying MoM blockings to each fFf \in \mathcal{F} and controlling the complexity via discretization or pseudodimension, yielding uniform O(ε)O(\varepsilon) accuracy at sample size

n(vp/εp)1/(p1)[logNF+log(1/δ)]n \asymp (v_p/\varepsilon^p)^{1/(p-1)} [\log N_\mathcal{F} + \log(1/\delta)]

under LpL_p-moment bounds, p(1,2]p\in(1,2] (Høgsgaard et al., 17 Jun 2025).

Robust U-statistics: The MoM principle extends to UU-statistics by computing decoupled block-wise UU-statistics and taking their median. Under only finite variance (or LpL_p moments), the MoM UU-statistic matches the oracle rates for bounded kernels and remains robust to outliers (Joly et al., 2015, Laforgue et al., 2020). For canonical, symmetric kernels hh, the MoM UU-statistic converges at order (log(1/δ)/n)m/2(\log(1/\delta)/n)^{m/2} for degree mm, with explicit constants.

Kernel Mean Embedding: The MoM framework generalizes to Hilbert spaces. Block-wise mean feature vectors are computed, and the geometric median yields a robust kernel mean embedding. This achieves sub-Gaussian deviation and robust maximum mean discrepancy (MMD) estimation under trace-class kernel covariance operators with breakdown point 25%\approx 25\% (Lerasle et al., 2018).

4. Applications: Learning Theory, Clustering, Density Estimation

MoM-based estimators have become foundational in robust learning and unsupervised learning.

Robust Empirical Risk Minimization (ERM):

  • The MoM loss can be used in ERM and regularized ERM, conferring high resistance to label or covariate contamination in supervised learning (Lecué et al., 2017).
  • In high-dimensional regression (e.g., MOM-LASSO), MoM enables oracle-optimal estimation and variable selection under adversarial corruption, with statistical and computational guarantees matching non-contaminated minimax rates.

Clustering:

  • Integrated into convex clustering frameworks, as in COMET, or nonparametric Dirichlet Process-MoM clustering, MoM confers resistance to outliers and prevents cluster fragmentation or collapse (De et al., 12 Nov 2025, Basu et al., 2023).
  • These methods achieve weak consistency and near-oracle convergence rates, and empirically outperform kk-means, convex clustering, and other robust clusterers under heavy contamination (De et al., 12 Nov 2025, Basu et al., 2023, Høgsgaard et al., 17 Jun 2025).
  • MoM can robustify the empirical risk in kk-means, DP-means, or model-based Bayesian nonparametrics for both loss evaluation and cluster-number selection, yielding reliable detection of noise clusters.

Kernel Density Estimation:

  • The MoM-KDE computes classical kernel density estimators on each block and returns the pointwise median, ensuring asymptotic minimax rates and pointwise O(n1/2)O(n^{-1/2}) deviation with robustness to arbitrarily heavy-tailed contamination (Humbert et al., 2020).

Quasi-Monte Carlo and Quantum Tomography:

  • Median-of-means estimators applied to linearly scrambled digital nets yield dimension-independent convergence for high-dimensional integration, under strong tractability conditions (Pan, 20 May 2025).
  • In classical shadows estimation for quantum observables, MoM—and particularly efficient U-statistic-based variants—optimally trade off shot complexity, variance, and δ\delta-confidence under only finite variance (Fu et al., 4 Dec 2024).

5. Algorithmic Implementations, Practical Choices, and Variants

Algorithmic Workflow

  • Partition dataset into kk (scalar case) or KK (multivariate) disjoint blocks.
  • For each block, compute mean (or, in extensions, more complex functionals: blockwise risk, kernel mean, blockwise UU-statistic, etc.).
  • Aggregate using the median (coordinatewise, geometric, or over all blocks, as appropriate).

For high-dimensional data or regression, the MoM principle is applied to directional projections (extremal functionals for general norms), or to function classes by discretizing or covering via VC-dimension or pseudodimension (Lugosi et al., 2018, Wang et al., 5 Sep 2024, Høgsgaard et al., 17 Jun 2025).

Distributed and Byzantine-robust Settings: MoM and its variance-reduced variants can be deployed in distributed architectures, providing resistance to Byzantine (arbitrarily corrupted) nodes. Practical implementations show that the variance-reduced MoM achieves efficiency 0.95\approx0.95 relative to the Cramér-Rao bound while tolerating up to 50%50\% corruption (Tu et al., 2021).

Block Number and Tuning: The number of blocks kk is central—large kk favors robustness to contamination/heavy tails, small kk favors efficiency for light-tailed/lower-variance data. Data-driven or two-stage selection (MoMoM) strategies and cross-validation are frequently recommended (Juan et al., 9 Oct 2025, Lecué et al., 2017).

Variants and Enhancements

  • Geometric Median-of-Means: Geometric median provides robustness when the mean is not well-defined, or in non-Euclidean (and RKHS) settings (Lerasle et al., 2018, Minsker et al., 2023).
  • Efficient and U-statistic-based Variants: Overlapping, data-symmetrized or permutation-invariant blockings deliver improved constants and relax moment requirements; random incomplete U-statistics enable scalable approximation and tighter deviation inequalities (Minsker, 2023, Fu et al., 4 Dec 2024).
  • Bayesian Median-of-Means: Interpolates between the mean and the median via Dirichlet reweighting, and achieves lower variance with small bias (asymptotically negligible) (Orenstein, 2019).

6. Limitations and Future Directions

The MoM estimator's structure—single tunable kk, median-of-blocks—creates trade-offs: optimality for heavy tails may come at the expense of suboptimality for light-tailed data. No single choice of kk is uniformly optimal over all distributional regimes; data-adaptive tuning, hybridization with trimmed/Catoni estimators, and multi-stage/recursive median-of-means remain active topics (Juan et al., 9 Oct 2025).

MoM does not, in general, achieve minimax-optimal rates for all light-tailed models unless symmetry assumptions are imposed. Further research involves designing estimators that interpolate adaptively between regimes, leveraging empirical information on tail behavior and symmetry (Juan et al., 9 Oct 2025, Minsker, 2023).

Open problems include:

  • Extension to structured contamination models with dependent and heteroscedastic outliers.
  • Minimax analysis in infinite-dimensional settings (e.g., functional data, operator-valued mean estimation).
  • Fully data-driven, computationally efficient choices for kk.
  • Improved nonasymptotic analysis for incomplete U-statistic median-of-means estimators (Fu et al., 4 Dec 2024, Minsker, 2023).

7. Comparative Summary

Estimator Heavy Tails (finite variance) Light Tails (sub-Gaussian) Outlier Robustness Moment Assumption
Sample Mean Unreliable Optimal Breakdown $1/n$ Finite variance
Trimmed Mean Optimal (O(ϵ)O(\sqrt\epsilon)) Optimal (O(ϵlog(1/ϵ))O(\epsilon \sqrt{\log(1/\epsilon)})) Robust to large fraction (proportional to trim fraction) Higher moments for optimality
Catoni's M Optimal (O(ϵ)O(\sqrt\epsilon)) Optimal or suboptimal Robust with tuning parameter Requires known moment rr
MoM Optimal (O(ϵ)O(\sqrt\epsilon)) Suboptimal (O(ϵ2/3)O(\epsilon^{2/3}) unless symmetry) Breakdown 1/2\approx 1/2 Finite variance
Symmetric MoM Optimal (O(ϵ)O(\epsilon)) Optimal (O(ϵ)O(\epsilon)) Breakdown 1/2\approx 1/2 Finite variance + symmetry

MoM is unique in achieving minimax optimality for adversarial contamination in the heavy-tailed regime, and remains computationally efficient and widely applicable, but is not always optimal for light-tailed scenarios without further modification (Juan et al., 9 Oct 2025, Lecué et al., 2017, Wang et al., 5 Sep 2024).


References:

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Median-of-Means Estimator.