Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 53 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Median-of-Means Estimators

Updated 23 October 2025
  • Median-of-means estimators are robust statistical methods that partition data into blocks and use the median of block means to mitigate the impact of outliers.
  • They achieve exponential concentration bounds under weak moment assumptions, nearly matching classical U-statistic rates up to logarithmic factors.
  • Extensions to U-statistics for multivariate functionals and clustering tasks demonstrate practical utility in heavy-tailed data scenarios with rigorous performance guarantees.

A median-of-means estimator is a robust statistical method for estimating the mean of a distribution, particularly effective when data exhibit heavy tails or contamination. Rather than relying on a simple average, the data are partitioned into several blocks, individual means are computed per block, and the overall estimate is the median of these block means. This approach mitigates the undue influence of outlier blocks and yields sharp exponential concentration properties under weak moment assumptions. Its extensions to U-statistics enable robust estimation for multivariate functionals, facilitating rigorous performance guarantees even in scenarios where classical methods falter.

1. Construction and Methodology of Median-of-Means U-Statistics

Let h:XmRh: \mathcal{X}^m \to \mathbb{R} be a symmetric kernel, and X1,,XnX_1, \dots, X_n independent samples. The classical U-statistic estimator,

Un(h)=(nm)!n!(i1,,im)Inmh(Xi1,,Xim),U_n(h) = \frac{(n-m)!}{n!}\sum_{(i_1, \ldots, i_m) \in I_n^m} h(X_{i_1}, \ldots, X_{i_m}),

with InmI_n^m the set of mm-tuples of distinct indices, is unbiased for mh=E[h(X1,,Xm)]m_h = E[h(X_1, \ldots, X_m)]. However, when hh is unbounded and the distribution of the data is heavy-tailed, the U-statistic can be highly unstable due to the influence of extreme values.

The median-of-means (MoM) approach proceeds as follows:

  • Partition the nn data points into VV blocks, usually with VV chosen proportional to log(1/δ)\log(1/\delta) for a target confidence parameter δ\delta, yielding blocks B=(B1,,BV)\mathcal{B} = (B_1, \ldots, B_V).
  • For each mm-tuple (Bi1,,Bim)(B_{i_1}, \ldots, B_{i_m}) of distinct blocks, compute the "decoupled" U-statistic on these blocks:

UBi1,,Bim(h)=1Bi1Bim(k1,,km)IBi1,,Bimh(Xk1,,Xkm),U_{B_{i_1},\ldots,B_{i_m}}(h) = \frac{1}{|B_{i_1}|\cdots|B_{i_m}|} \sum_{(k_1, \ldots, k_m) \in I_{B_{i_1},\ldots,B_{i_m}}} h(X_{k_1}, \ldots, X_{k_m}),

where IBi1,,BimI_{B_{i_1},\ldots,B_{i_m}} runs over all mm-tuples with kjBijk_j \in B_{i_j} and kjk_j distinct.

  • The robust estimator is then

UB(h)=Med{UBi1,,Bim(h):1i1<<imV}.\overline{U}_\mathcal{B}(h) = \operatorname{Med}\left\{ U_{B_{i_1},\ldots,B_{i_m}}(h) : 1 \leq i_1 < \cdots < i_m \leq V \right\}.

This technique, which can be viewed as a decoupled U-statistic aggregation via block-wise medians, significantly reduces sensitivity to outliers or heavy-tailed observations in any single block.

2. Exponential Concentration and Performance Guarantees

The estimator UB(h)\overline{U}_\mathcal{B}(h) enjoys strong deviation bounds under minimal moment assumptions:

  • Finite Variance Case: For hh symmetric and PP-degenerate of order q1q-1 (i.e., centered projections up to order q1q-1 vanish) and with variance σ2<\sigma^2 < \infty,

Pr(UB(h)mh>Kmσ(log(1/δ)n)q/2)2δ\Pr\left( \left| \overline{U}_\mathcal{B}(h) - m_h \right| > K_m \sigma \left( \frac{\log(1/\delta)}{n} \right)^{q/2} \right) \leq 2\delta

for V32mlog(1/δ)V \approx 32m\log(1/\delta), where KmK_m is an explicit constant depending only on mm. When hh is canonical (q=mq=m), the convergence rate is (log(1/δ)/n)m/2(\log(1/\delta)/n)^{m/2}.

  • Finite pp-th Moment Case: If the centered hh has finite pp-th moment, 1<p21 < p \leq 2,

Pr(UB(h)mh>KmMp(log(1/δ)n)m(p1)/p)2δ,\Pr\left( \left| \overline{U}_\mathcal{B}(h) - m_h \right| > K_m M_p \left( \frac{\log(1/\delta)}{n} \right)^{m(p-1)/p} \right) \leq 2\delta,

where $M_p = E|h - m_h|^p^{1/p}$.

These exponential concentration inequalities generalize the classical results (Arcones–Giné) to the robust regime, showing that the MoM estimator achieves rates nearly matching those for bounded or sub-Gaussian kernels, but under much weaker moment conditions.

3. Application to Clustering Problems

The robust estimator proves particularly valuable in clustering scenarios where the empirical risk is naturally a U-statistic. Consider clustering risk expressed as

W(P)=E[D(X,X)ΦP(X,X)],W(\mathcal{P}) = E[ D(X,X') \cdot \Phi_{\mathcal{P}}(X,X') ],

with DD a potentially heavy-tailed dissimilarity, and ΦP\Phi_\mathcal{P} an indicator of cluster membership. The standard empirical counterpart,

W^n(P)=2n(n1)i<jD(Xi,Xj)ΦP(Xi,Xj),\widehat{W}_n(\mathcal{P}) = \frac{2}{n(n-1)} \sum_{i < j} D(X_i, X_j) \Phi_\mathcal{P}(X_i, X_j),

is a second-order U-statistic and can be highly unstable when DD is heavy-tailed.

The MoM extension defines

WB(P)=Med{UBi,Bj(hP)},\overline{W}_\mathcal{B}(\mathcal{P}) = \operatorname{Med}\left\{ U_{B_i, B_j}(h_\mathcal{P}) \right\},

with hP(x,x)=D(x,x)(ΦP(x,x)baseline)h_\mathcal{P}(x,x') = D(x,x')(\Phi_\mathcal{P}(x,x') - \text{baseline}). For a finite class ΠK\Pi_K, the estimator achieves a uniform deviation bound: Pr(supPΠKWB(P)W(P)>Cσ(log(N/δ)n)1/2)δ,\Pr\left( \sup_{\mathcal{P} \in \Pi_K} |\overline{W}_\mathcal{B}(\mathcal{P}) - W(\mathcal{P})| > C \sigma \left( \frac{\log(N/\delta)}{n} \right)^{1/2} \right) \leq \delta, where σ2=E[D(X1,X2)2]<\sigma^2 = E[D(X_1, X_2)^2] < \infty. This is critical for statistical learning guarantees in model selection over clusterings, even with heavy-tailed losses.

4. Robustness, Assumptions, and Comparison to Classical Methods

Advantages:

  • Robustness to Heavy Tails: The use of the median across blocks ensures that the estimator remains stable against a minority of blocks contaminated by extreme values.
  • Minimal Moment Assumptions: Performance guarantees require only finite second moments or, more generally, finite pp-th moments for p>1p > 1, while classical U-statistics need bounded or sub-Gaussian kernels.
  • Near-Optimal Rates: The MoM U-statistic achieves convergence rates that are close to those of the classical U-statistic under boundedness, modulo at most logarithmic factors.
  • Applicability: Enables clustering and other U-statistics-based learning tasks to be performed reliably when losses or dissimilarities have heavy tails.

Limitations:

  • Confidence-Dependent Construction: The number of blocks VV must be set as a function of the target confidence δ\delta, which implies different estimators are used for different levels of confidence.
  • Efficiency Trade-Off: For light-tailed data, replacing the mean with the median can be slightly conservative, incurring a minor efficiency cost.

The MoM methodology thus provides a robust alternative with rigorous guarantees in settings where classical approaches can be rendered ineffective by a small fraction of extreme values.

5. Implementation Considerations and Algorithmic Aspects

An explicit outline of the MoM U-statistic estimator:

  1. Block Partitioning: Partition nn data points into VV equal (as possible) blocks, with Vlog(1/δ)V \propto \log(1/\delta) for confidence δ\delta.
  2. Decoupled U-statistics: For every mm-tuple of distinct blocks, compute the U-statistic using exactly one data point from each block.
  3. Aggregation: Output the median of all these blockwise U-statistics as the final estimate.

For computational efficiency, the blockwise statistics can be parallelized. Performance bounds hold when VV is not much larger than necessary to balance deviation and sample size, i.e., VnV \ll n.

In applications such as clustering, MoM can be used not only for risk estimation but as a core component of algorithms (e.g., robust centroid updates or model selection via robust estimates of risk functions).

6. Broader Implications and Relevance in Statistical Learning

The use of the median-of-means principle for U-statistics, as established in this work, has significant implications for modern statistical learning:

  • It extends robust mean estimation to multivariate and pairwise functionals pervasive in unsupervised and supervised learning.
  • The exponential concentration under weak moment assumptions ensures high-confidence generalization bounds even in adverse data regimes.
  • The technique bridges classical UU-statistics theory with robust machine learning, enabling statistically sound procedures for clustering, ranking, and other tasks reliant on pairwise or higher-order statistics.

A plausible implication is that the median-of-means U-statistic can serve as a paradigm for robustification in other high-variance empirical risk settings, especially where complex dependency structures (e.g., U-processes) are present.

7. Summary Table: Performance Characteristics

Scenario Moment Assumption Rate (up to logs)
Canonical kernel, finite variance Var(h)<\operatorname{Var}(h)<\infty (log(1/δ)/n)m/2(\log(1/\delta)/n)^{m/2}
Canonical kernel, finite pp-th moment Ehmhp<, 1<p2E|h - m_h|^p < \infty,\ 1 < p \leq 2 (log(1/δ)/n)m(p1)/p(\log(1/\delta)/n)^{m(p-1)/p}
Clustering (finite ΠK\Pi_K) E[D(X1,X2)2]<E[D(X_1,X_2)^2]<\infty (log(N/δ)/n)1/2(\log(N/\delta)/n)^{1/2}

This summarizes, for canonical kernels in U-statistics (order mm), the required moments and the resulting convergence rates for the median-of-means estimator, highlighting its robustness and performant behavior.


The median-of-means U-statistic methodology thus achieves robust, high-confidence estimation in the presence of heavy tails, extending the scope of reliable statistical inference and learning well beyond the reach of classical, mean-based U-statistics (Joly et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Median-of-Means Estimators.