Papers
Topics
Authors
Recent
2000 character limit reached

Median-of-Means U-Statistics

Updated 28 November 2025
  • Median-of-Means U-Statistics is a robust estimator that partitions data into blocks, computes U-statistics for each, and takes their median to mitigate heavy-tailed effects and contamination.
  • It achieves minimax-optimal deviation bounds under finite variance and mild moment conditions, even in the presence of outliers or adversarial contamination.
  • The framework extends to high-dimensional settings and kernel methods, enabling robust risk minimization in a variety of modern statistical and machine learning tasks.

The median-of-means (MoM) U-statistic framework extends robust mean estimation to the estimation of general statistical functionals defined as U-statistics, allowing for minimax-optimal deviation bounds under only finite variance or mild moment conditions, even in the presence of heavy tails or adversarial contamination. This estimator operates by partitioning data into blocks, computing U-statistics within each block, and taking the median of these blockwise estimates, thereby safeguarding against the influence of outliers or heavy-tailed observations. The approach generalizes unions of robust statistics, high-probability bounds, and algorithmic learning guarantees to the context of U-processes, offering a toolkit for robust risk minimization in a variety of modern statistical and machine learning tasks.

1. Formal Definitions and Construction

Let X1,,XnX_1, \dots, X_n be i.i.d. observations from distribution PP on a measurable space, and let h:XmRh: \mathcal{X}^m \to \mathbb{R} be a symmetric kernel with finite variance. The classical mm-sample U-statistic is

Un(h)=1(nm)1i1<<imnh(Xi1,,Xim),U_n(h) = \frac{1}{\binom{n}{m}}\sum_{1 \leq i_1 < \cdots < i_m \leq n} h(X_{i_1}, \dots, X_{i_m}),

which is the unbiased minimum-variance estimator of the parameter θ(h)=Eh(X1,...,Xm)\theta(h) = \mathbb{E} h(X_1, ..., X_m). For heavy-tailed data or contaminated samples, the empirical U-statistic loses its sub-Gaussian deviation properties.

The median-of-means U-statistic (MoM-U) partitions the sample {1,...,n}\{1, ..., n\} into KK (possibly random or deterministic) blocks of size Bn/KB \approx n/K. For each block Bk\mathcal{B}_k, compute the block U-statistic U^k(h)\hat{U}_k(h), and define

θ^MoU(h)=median(U^1(h),...,U^K(h)).\hat{\theta}_{\text{MoU}}(h) = \mathrm{median}(\hat{U}_1(h), ..., \hat{U}_K(h)).

Randomized versions (where each block is drawn without replacement from the full sample) are denoted MoRU; both deterministic and randomized schemes achieve the same asymptotic rates (Laforgue et al., 2022).

A further permutation-invariant variant, viewing the estimator as the median over all possible subsample means from micro-blocks, can achieve optimal constants up to $1 + o(1)$ factors under mild moment growth assumptions, via a U-statistic kernel of arbitrary order (Minsker, 2022, Fu et al., 4 Dec 2024).

2. Main Deviation Guarantees and Concentration Results

Under minimal moment assumptions, MoM-U estimators achieve high-probability deviation bounds essentially matching classical U-statistics for bounded or sub-Gaussian kernels, but remain robust for heavy-tailed or contaminated data.

For symmetric, square-integrable kernel hh, and confidence parameter δ(0,1)\delta \in (0,1), deviate bounds take the form: Pr(θ^MoU(h)θ(h)O(log(1/δ)n))δ,\Pr\left( |\hat{\theta}_{\text{MoU}}(h) - \theta(h)| \geq O\left( \sqrt{\frac{\log(1/\delta)}{n}} \right) \right) \leq \delta, with explicit dependence of the constants on mm, h\|h\|_\infty, and the variance/projection structure of hh (Joly et al., 2015, Laforgue et al., 2022).

In the presence of arbitrary contamination, with ϵ=n0/n\epsilon = n_0 / n the outlier fraction and K>2n0K > 2 n_0, the estimator satisfies, with probability 1δ1 - \delta,

θ^MoU(h)θ(h)CΓ(ϵ)log(1/δ)n,|\hat{\theta}_{\text{MoU}}(h) - \theta(h)| \leq C\, \Gamma(\epsilon) \sqrt{\frac{\log(1/\delta)}{n}},

where the blow-up constant Γ(ϵ)\Gamma(\epsilon) diverges only as ϵ1/2\epsilon \rightarrow 1/2 (Laforgue et al., 2020). For canonical kernels of order m=2m=2, fully degenerate, the error rate improves to O((1+log(1/δ))/n)O((1 + \log(1/\delta)) / n).

If hh is only pp-integrable for 1<p21 < p \leq 2, the deviation rate is O((log(1/δ)/n)m(p1)/p)O\left( ( \log(1/\delta) / n )^{m(p-1)/p} \right) (Joly et al., 2015). In the permutation-invariant U-statistic MoM variant, provided m=o(N)m = o(N) and mild polynomial moment conditions, the sub-Gaussian deviation bound is nearly sharp: Pr(μ^Nμt)2exp((1o(1))Nt22σ2)\Pr\left( |\hat{\mu}_N - \mu| \geq t \right) \leq 2 \exp\left( - (1-o(1)) \frac{N t^2}{2 \sigma^2} \right) (Minsker, 2022).

3. Robustness to Outliers and Heavy Tails

The MoM-U approach tolerates contamination fractions ϵ<1/2\epsilon < 1/2, preserving the O(n1/2)O(n^{-1/2}) rates (up to a constant) as long as the majority of blocks remain uncontaminated. This property extends to both univariate means and general U-statistics, including applications to pairwise losses and multivariate kernels (Laforgue et al., 2020). The blocking and median structure ensures that outliers or heavy-tailed blocks are discarded, as the median selects a value from the majority of "sane" blocks.

This estimator also attains optimal rates and sub-Gaussian tails for kernel mean embeddings and maximum mean discrepancy (MMD) in reproducing kernel Hilbert spaces—up to n/4n/4 arbitrary sample corruptions can be handled while maintaining consistency (Lerasle et al., 2018).

4. Comparison with Classical (Unrobust) U-Statistics

Classical U-statistics require boundedness or sub-Gaussian tails for concentration, failing for heavy-tailed or adversarial data (e.g., when the kernel is α\alpha-stable). In such regimes, deviation inequalities no longer hold (Joly et al., 2015). MoM-U estimators, however, only assume finite variance (or finite pp-th moment) and maintain high-confidence, nonasymptotic bounds.

A comparison of rates:

Method Assumptions Deviation Rate (degenerate m=2m=2) Constant (order)
Classical U-stat (h\|h\|_\infty bounded) [Arcones-Ginè] Boundedness O(log(1/δ)/n)O( \log(1/\delta)/n ) c1hc_1 \|h\|_\infty
MoM-U (finite variance) (Joly et al., 2015, Laforgue et al., 2022) E[h2]<\mathbb{E}[h^2]<\infty O(log(1/δ)/n)O( \sqrt{ \log(1/\delta) / n }) CmσC_m \sigma
MoM-U (contamination, bounded hh) (Laforgue et al., 2020) Outliers ϵ<1/2\epsilon<1/2 O(Γ(ϵ)log(1/δ)/n)O( \Gamma(\epsilon) \sqrt{ \log(1/\delta) / n }) 4dMΓ(ϵ)4\sqrt{d} M \Gamma(\epsilon)

For permutation-invariant MoM-U (growing-order U-statistic), the deviation constants are nearly optimal, matching the Gaussian minimax constant up to $1 + o(1)$ (Minsker, 2022, Fu et al., 4 Dec 2024).

5. Extensions, Algorithmic Schemes, and Applications

Median-of-means U-statistics have been applied throughout robust learning, especially for pairwise or higher-order losses. Key algorithmic mechanisms include block-wise risk minimization and median-of-blocks (MoM-U) gradient descent for parameter learning; these approaches ensure robust excess risk bounds even in contaminated settings: R(g^)R(g)CΓ(ϵ)(dVC(1+logn)+log(1/δ))/nR(\hat{g}) - R(g^\star) \leq C\, \Gamma(\epsilon) \sqrt{ (d_{VC} (1+\log n) + \log(1/\delta)) / n } for a VC-class hypothesis set GG and bounded loss (Laforgue et al., 2020).

The method generalizes to randomized block construction (random resampling of blocks) and incomplete U-statistic computation for computational efficiency (Laforgue et al., 2022, Fu et al., 4 Dec 2024). Applications include:

  • Robust kernel mean embedding and MMD, via blockwise geometric medians in Hilbert space (the "MONK" algorithm), relevant for distributional hypothesis testing and two-sample testing (Lerasle et al., 2018)
  • Robust clustering, by plugging MoM-U estimators into within-cluster dissimilarity objectives, yielding oracle inequalities and minimax rates under only finite variance and low-noise conditions (Joly et al., 2015)
  • Classical shadows in quantum tomography, where U-statistic MoM estimators optimize the measurement complexity and improve deviation constants in high-dimensional expectation estimation (Fu et al., 4 Dec 2024)

6. Proof Techniques and Structural Insights

MoM-U deviation bounds rest on a two-level combination of classical U-statistic and median-of-means arguments, relying crucially on the Hoeffding (or Hájek) projection decomposition:

  • For each blockwise U-statistic, block independence (or near-independence) enables Chebyshev or Hoeffding-type concentration for the block estimators.
  • The median-of-means principle discards up to K/2K/2 corrupted blocks without biasing the central estimator.
  • In higher-order or randomized settings, concentration for the median is derived from Bernoulli tail bounds (binomial deviations) for the proportion of "good" blocks (Laforgue et al., 2022, Laforgue et al., 2020).
  • For growing-order U-statistics, symmetrization and control of higher-order degenerate terms imply that the estimator inherits the tight deviation properties of first-order (mean-like) projections, with all higher-order terms exponentially negligible (Minsker, 2022).

7. Practical Considerations, Limitations, and Current Frontiers

The choice of block size and number is critical, with the standard recommendation Kclog(1/δ)K \simeq c \log(1/\delta) for confidence 1δ1 - \delta. Computational trade-offs appear in large-scale or high-order U-statistic scenarios, leading to incomplete U-statistics or randomized block designs (random or cyclic sampling of micro-blocks) for tractable implementation at modest efficiency loss (Fu et al., 4 Dec 2024).

A distinctive feature is the retention of O(n1/2)O(n^{-1/2}) rates and tolerance of up to ϵ<1/2\epsilon < 1/2 adversarial contamination—a sharp breakdown point. However, as ϵ1/2\epsilon \to 1/2, constants diverge; performance deteriorates for high contamination fractions. The methods also necessitate careful handling of block dependencies for higher-order U-statistics, often requiring conditional concentration or diagonal block restrictions (Laforgue et al., 2020).

Recent work emphasizes sharper deviation constants via permutation-invariant U-statistics and extends the robust MoM-U methodology to risk minimization, kernel methods, clustering, and large-scale quantum information settings (Minsker, 2022, Lerasle et al., 2018, Fu et al., 4 Dec 2024). Open questions remain on optimal adaptivity to unknown moment conditions and computationally efficient, high-order, high-dimensional variants.


Key References:

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Median-of-Means U-Statistics.