Median-of-Means Estimators

Updated 23 October 2025

Median-of-means estimators are robust statistical methods that partition data into blocks and use the median of block means to mitigate the impact of outliers.
They achieve exponential concentration bounds under weak moment assumptions, nearly matching classical U-statistic rates up to logarithmic factors.
Extensions to U-statistics for multivariate functionals and clustering tasks demonstrate practical utility in heavy-tailed data scenarios with rigorous performance guarantees.

A median-of-means estimator is a robust statistical method for estimating the mean of a distribution, particularly effective when data exhibit heavy tails or contamination. Rather than relying on a simple average, the data are partitioned into several blocks, individual means are computed per block, and the overall estimate is the median of these block means. This approach mitigates the undue influence of outlier blocks and yields sharp exponential concentration properties under weak moment assumptions. Its extensions to U-statistics enable robust estimation for multivariate functionals, facilitating rigorous performance guarantees even in scenarios where classical methods falter.

1. Construction and Methodology of Median-of-Means U-Statistics

Let $h: \mathcal{X}^m \to \mathbb{R}$ be a symmetric kernel, and $X_1, \dots, X_n$ independent samples. The classical U-statistic estimator,

$U_n(h) = \frac{(n-m)!}{n!}\sum_{(i_1, \ldots, i_m) \in I_n^m} h(X_{i_1}, \ldots, X_{i_m}),$

with $I_n^m$ the set of $m$ -tuples of distinct indices, is unbiased for $m_h = E[h(X_1, \ldots, X_m)]$ . However, when $h$ is unbounded and the distribution of the data is heavy-tailed, the U-statistic can be highly unstable due to the influence of extreme values.

The median-of-means (MoM) approach proceeds as follows:

Partition the $n$ data points into $V$ blocks, usually with $V$ chosen proportional to $\log(1/\delta)$ for a target confidence parameter $\delta$ , yielding blocks $\mathcal{B} = (B_1, \ldots, B_V)$ .
For each $m$ -tuple $(B_{i_1}, \ldots, B_{i_m})$ of distinct blocks, compute the "decoupled" U-statistic on these blocks:

$U_{B_{i_1},\ldots,B_{i_m}}(h) = \frac{1}{|B_{i_1}|\cdots|B_{i_m}|} \sum_{(k_1, \ldots, k_m) \in I_{B_{i_1},\ldots,B_{i_m}}} h(X_{k_1}, \ldots, X_{k_m}),$

where $I_{B_{i_1},\ldots,B_{i_m}}$ runs over all $m$ -tuples with $k_j \in B_{i_j}$ and $k_j$ distinct.

The robust estimator is then

$\overline{U}_\mathcal{B}(h) = \operatorname{Med}\left\{ U_{B_{i_1},\ldots,B_{i_m}}(h) : 1 \leq i_1 < \cdots < i_m \leq V \right\}.$

This technique, which can be viewed as a decoupled U-statistic aggregation via block-wise medians, significantly reduces sensitivity to outliers or heavy-tailed observations in any single block.

2. Exponential Concentration and Performance Guarantees

The estimator $\overline{U}_\mathcal{B}(h)$ enjoys strong deviation bounds under minimal moment assumptions:

Finite Variance Case: For $h$ symmetric and $P$ -degenerate of order $q-1$ (i.e., centered projections up to order $q-1$ vanish) and with variance $\sigma^2 < \infty$ ,

$\Pr\left( \left| \overline{U}_\mathcal{B}(h) - m_h \right| > K_m \sigma \left( \frac{\log(1/\delta)}{n} \right)^{q/2} \right) \leq 2\delta$

for $V \approx 32m\log(1/\delta)$ , where $K_m$ is an explicit constant depending only on $m$ . When $h$ is canonical ( $q=m$ ), the convergence rate is $(\log(1/\delta)/n)^{m/2}$ .

Finite $p$ -th Moment Case: If the centered $h$ has finite $p$ -th moment, $1 < p \leq 2$ ,

$\Pr\left( \left| \overline{U}_\mathcal{B}(h) - m_h \right| > K_m M_p \left( \frac{\log(1/\delta)}{n} \right)^{m(p-1)/p} \right) \leq 2\delta,$

where $M_p = E|h - m_h|^p^{1/p}$.

These exponential concentration inequalities generalize the classical results (Arcones–Giné) to the robust regime, showing that the MoM estimator achieves rates nearly matching those for bounded or sub-Gaussian kernels, but under much weaker moment conditions.

3. Application to Clustering Problems

The robust estimator proves particularly valuable in clustering scenarios where the empirical risk is naturally a U-statistic. Consider clustering risk expressed as

$W(\mathcal{P}) = E[ D(X,X') \cdot \Phi_{\mathcal{P}}(X,X') ],$

with $D$ a potentially heavy-tailed dissimilarity, and $\Phi_\mathcal{P}$ an indicator of cluster membership. The standard empirical counterpart,

$\widehat{W}_n(\mathcal{P}) = \frac{2}{n(n-1)} \sum_{i < j} D(X_i, X_j) \Phi_\mathcal{P}(X_i, X_j),$

is a second-order U-statistic and can be highly unstable when $D$ is heavy-tailed.

The MoM extension defines

$\overline{W}_\mathcal{B}(\mathcal{P}) = \operatorname{Med}\left\{ U_{B_i, B_j}(h_\mathcal{P}) \right\},$

with $h_\mathcal{P}(x,x') = D(x,x')(\Phi_\mathcal{P}(x,x') - \text{baseline})$ . For a finite class $\Pi_K$ , the estimator achieves a uniform deviation bound: $\Pr\left( \sup_{\mathcal{P} \in \Pi_K} |\overline{W}_\mathcal{B}(\mathcal{P}) - W(\mathcal{P})| > C \sigma \left( \frac{\log(N/\delta)}{n} \right)^{1/2} \right) \leq \delta,$ where $\sigma^2 = E[D(X_1, X_2)^2] < \infty$ . This is critical for statistical learning guarantees in model selection over clusterings, even with heavy-tailed losses.

4. Robustness, Assumptions, and Comparison to Classical Methods

Advantages:

Robustness to Heavy Tails: The use of the median across blocks ensures that the estimator remains stable against a minority of blocks contaminated by extreme values.
Minimal Moment Assumptions: Performance guarantees require only finite second moments or, more generally, finite $p$ -th moments for $p > 1$ , while classical U-statistics need bounded or sub-Gaussian kernels.
Near-Optimal Rates: The MoM U-statistic achieves convergence rates that are close to those of the classical U-statistic under boundedness, modulo at most logarithmic factors.
Applicability: Enables clustering and other U-statistics-based learning tasks to be performed reliably when losses or dissimilarities have heavy tails.

Limitations:

Confidence-Dependent Construction: The number of blocks $V$ must be set as a function of the target confidence $\delta$ , which implies different estimators are used for different levels of confidence.
Efficiency Trade-Off: For light-tailed data, replacing the mean with the median can be slightly conservative, incurring a minor efficiency cost.

The MoM methodology thus provides a robust alternative with rigorous guarantees in settings where classical approaches can be rendered ineffective by a small fraction of extreme values.

5. Implementation Considerations and Algorithmic Aspects

An explicit outline of the MoM U-statistic estimator:

Block Partitioning: Partition $n$ data points into $V$ equal (as possible) blocks, with $V \propto \log(1/\delta)$ for confidence $\delta$ .
Decoupled U-statistics: For every $m$ -tuple of distinct blocks, compute the U-statistic using exactly one data point from each block.
Aggregation: Output the median of all these blockwise U-statistics as the final estimate.

For computational efficiency, the blockwise statistics can be parallelized. Performance bounds hold when $V$ is not much larger than necessary to balance deviation and sample size, i.e., $V \ll n$ .

In applications such as clustering, MoM can be used not only for risk estimation but as a core component of algorithms (e.g., robust centroid updates or model selection via robust estimates of risk functions).

6. Broader Implications and Relevance in Statistical Learning

The use of the median-of-means principle for U-statistics, as established in this work, has significant implications for modern statistical learning:

It extends robust mean estimation to multivariate and pairwise functionals pervasive in unsupervised and supervised learning.
The exponential concentration under weak moment assumptions ensures high-confidence generalization bounds even in adverse data regimes.
The technique bridges classical $U$ -statistics theory with robust machine learning, enabling statistically sound procedures for clustering, ranking, and other tasks reliant on pairwise or higher-order statistics.

A plausible implication is that the median-of-means U-statistic can serve as a paradigm for robustification in other high-variance empirical risk settings, especially where complex dependency structures (e.g., U-processes) are present.

7. Summary Table: Performance Characteristics

Scenario	Moment Assumption	Rate (up to logs)
Canonical kernel, finite variance	$\operatorname{Var}(h)<\infty$	$(\log(1/\delta)/n)^{m/2}$
Canonical kernel, finite $p$ -th moment	$E\|h - m_h\|^p < \infty,\ 1 < p \leq 2$	$(\log(1/\delta)/n)^{m(p-1)/p}$
Clustering (finite $\Pi_K$ )	$E[D(X_1,X_2)^2]<\infty$	$(\log(N/\delta)/n)^{1/2}$

This summarizes, for canonical kernels in U-statistics (order $m$ ), the required moments and the resulting convergence rates for the median-of-means estimator, highlighting its robustness and performant behavior.

The median-of-means U-statistic methodology thus achieves robust, high-confidence estimation in the presence of heavy tails, extending the scope of reliable statistical inference and learning well beyond the reach of classical, mean-based U-statistics (Joly et al., 2015).

PDF Markdown Chat (Pro)

References (1)

Robust estimation of U-statistics (2015)

Follow Topic

Get notified by email when new papers are published related to Median-of-Means Estimators.